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INTRODUCTION 

Lung  cancer  is  the  leading  cause  of  cancer  death  in  the  world.  Non-small  cell  lung  cancer 
(NSCLC)  accounts  for  85%  of  all  lung  cancer  cases.  Only  15%  of  patients  diagnosed  with  lung 
cancer  survive  five  years  from  diagnosis.  Therapy  for  advanced  disease  increases  average  life 
expectancy  by  only  a  few  months,  and  slightly  improves  quality  of  life.  Similarly,  adjuvant 
chemotherapy  for  resected  disease  has  only  a  modest  impact  on  survival  rates.  More  effective 
therapy  is  needed.  We  believe  that  applying  state-of-the-art  molecular  tools  to  carefully 
conducted  clinical  trials  will  lead  to  the  identification  of  molecular  mechanisms  that  contribute  to 
lung  cancer  therapeutic  resistance  and  that  drive  prognosis,  and  that  this  in  turn  will  lead  to  the 
development  of  drugs  with  novel  biological  and  therapeutic  functions.  Therefore,  we  have 
undertaken  a  translational  research  program  named  PROSPECT:  Profiling  of  Resistance 
Patterns  &  Oncogenic  Signaling  Pathways  in  Evaluation  of  Cancers  of  the  Thorax  and 
Therapeutic  Target  Identification.  The  goal  of  PROSPECT  is  to  use  therapeutic  target-focused 
(TTF)  profiling  along  with  genome-wide  mRNA  and  serum  phosphopeptide  profiling  to  identify 
and  evaluate  molecular  targets  and  pathways  that  contribute  to  therapeutic  sensitivity  or 
resistance,  prognosis,  and  recurrence  patterns,  and  to  use  this  information  to  guide  formulation 
of  new  rational  therapeutic  strategies  for  NSCLC  and  mesotheliomas.  In  the  Program,  we  have 
5  research  projects  and  3  Cores  to  address  3  central  issues:  therapeutic  resistance,  prognosis 
and  new  therapeutic  targets  and  strategies. 

PROGRESS  REPORT  (BODY): 

Project  1:  Therapeutic  target-focused  (TTF)  profiling  for  the  identification  of  molecular 
targets  and  pathways  that  contribute  to  drug  sensitivity  or  resistance  in  vitro  and  the 
development  of  rational  treatment  strategies  for  NSCLC. 

(Leader:  Dr.  John  Heymach;  Co-Leader:  Dr.  John  Minna) 

Hypotheses: 

We  hypothesize  that  a  broad,  systematic  molecular  profiling  of  NSCLC  cell  lines,  using  both 
TTF  and  global  approaches,  will  lead  to  the  following  results: 

1 .  The  identification  of  new  potential  therapeutic  targets  for  NSCLC 

2.  The  development  of  predictive  markers  for  in  vitro  sensitivity  to  targeted  agents,  which  will 
form  the  starting  point  for  the  development  of  a  predictive  model  of  in  vivo  sensitivity  using 
clinical  specimens  as  described  in  Aim  3. 

3.  Insights  into  the  molecular  mechanism  underlying  therapeutic  resistance  and  into  the 
relationship  of  resistance  mechanisms  to  factors  innately  affecting  tumor  growth  rate  and 
prognosis 

4.  Identification  of  readily  translatable  therapeutic  strategies  to  combat  these  resistance 
mechanisms. 

Specific  Aims: 

In  this  project,  we  will  develop  and  validate  a  novel  therapeutic  target-focused  (TTF)  profiling 
platform  at  M.D.  Anderson  Cancer  Center.  The  platform  will  provide  a  high  throughput, 
quantitative,  scalable,  and  highly  sensitive  set  of  assays  to  assess  activation  of  key  signaling 
pathways  (e.g.,  PI3K/AKT,  STAT,  RAS-RAF-ERK)  as  well  as  other  potential  therapeutic  targets 
such  as  receptor  tyrosine  kinases  (RTKs).  It  will  be  coupled  with  global  profiling  of  gene 
expression  using  Affymetrix  2.0  array.  These  molecular  profiles  will  then  be  coupled  with 
information  from  a  broad  drug  and  therapeutic  target  siRNA  (DATS)  screen  to  develop  markers 
for  predicting  drug  sensitivity  in  vitro  based  on  molecular  profiles,  elucidate  the  molecular 
determinants  of  sensitivity  or  resistance  to  a  given  therapeutic  agent,  and  identify  potential 
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therapeutic  targets  for  tumor  cells  resistant  to  a  given  agent.  This  project  lays  the  foundation  for 
Project  3,  where  the  same  TTF  and  global  profiling  approaches  will  be  used  to  characterize 
clinical  tumor  specimens  and  investigate  molecular  markers  identified  in  this  project,  for  Project 
4,  in  which  the  profiles  and  therapeutic  targets  for  mesothelioma  will  be  explored,  and  for 
Project  2,  in  which  the  profiles  will  be  correlated  with  patient  prognosis  and  metastatic  patterns. 
The  specific  aims  of  this  project  are  as  follows: 

Specific  Aim  1:  To  develop  a  TTF  profile  for  assessing  critical  signaling  pathways  and 
potential  therapeutic  targets,  and  to  apply  TTF  and  gene  expression  profiling  to  NSCLC 
and  mesothelioma  cell  lines. 

1.1.  Development  and  technical  validation  of  a  TTF  profile  using  reverse  phase  lysate  arrays 
(RPPA)  and  multiplexed  bead  array  technology. 

1.2.  Application  of  TTF  profiling  to  a  cell  line  panel  representing  malignant  (NSCLC  and 
mesothelioma)  and  non-malignant  (endothelial  and  stromal  cells,  normal  bronchial  epithelium) 
cell  types. 

1 .3.  Gene  expression  profiling  of  the  cell  line  panel  using  Affymetrix  microarrays. 

1.4.  Correlation  of  TTF  and  gene  expression  profiles  from  the  cell  line  panel  to  determine  gene 
expression  signatures  that  correlate  with  activation  of  individual  proteins  (e.g.,  EGFR  activation) 
and  critical  signaling  pathways  (e.g.,  RAS  pathway  activation). 

Specific  Aim  2:  To  determine  the  sensitivity  of  the  cell  line  panel  to  the  selected  drug  and 
therapeutic  target  siRNA  (DATS)  screen. 

2.1.  Screening  of  the  cell  line  panel  for  sensitivity  to  a  panel  of  20-25  targeted  agents  and 
standard  chemotherapy  agents. 

2.2.  Screening  of  the  cell  line  panel  using  siRNA  representing  potential  therapeutic  targets, 
including  molecules  targeted  by  specific  agents  in  Aim  2.1  (e.g.,  EGFR,  IGFR-1,  etc.)  and 
potential  therapeutic  targets  for  which  drugs  are  not  currently  available  (e.g.,  RTKs  for  which 
drugs  are  currently  in  development). 

2.3.  Comparison  of  in  vitro  and  in  vivo  profiles  (TTF  and  global)  and  drug  sensitivity  in  selected 
NSCLC  cell  lines  and  xenografts  grown  from  the  same  lines. 

Specific  Aim  3:  Development  of  markers  for  predicting  drug  and  targeted  siRNA 
sensitivity  in  vitro  based  on  TTF  and  molecular  profiles,  and  identification  of  candidate 
therapeutic  targets  in  chemotherapy-resistant  lines. 

Summary  of  Research  Findings 

In  March  2009,  Dr.  Li  Mao  (former  Co-Leader)  left  the  institution  to  accept  a  position  at  the 
University  of  Maryland  at  Baltimore;  therefore,  Dr.  Heymach  has  assumed  responsibility  for 
completion  of  Dr.  Mao’s  proposed  studies. 

Over  the  past  year,  we  have  expanded  our  set  of  NSCLC  cell  lines  and  have  completed  gene 
expression  and  protein  profiling  on  all  of  these  cell  lines.  The  mRNA  expression  data  from  each 
of  50  NSCLC  lines  correlated  with  various  drug  response  phenotypes  and  we  identified 
signatures  predictive  of  response.  Likewise,  proteomic  profiles  correlated  with  in  vitro  drug 
response  data  for  a  variety  of  drugs.  We  are  in  the  process  of  validating  these  findings  in 
xenografts  and  in  clinical  samples  from  patients  treated  with  these  drugs. 
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Database  of  preclinical  molecular  profiles  including  mRNA  gene  expression  and  proteomic 

profiles  of  NSCLC  cell  lines,  tumor  specimens,  and  xenografts;  mRNA  profiling  of  NSCLC  cell 

lines  and  tumors.  We  have  already  performed  genome-wide  mRNA  expression  profiling  using 
Affymetrix  HGU133A,  B,  or  Plus2  or  lllumina  WG6-v2  gene  chips  for  more  than  50  NSCLC  and 
30  SCLC  lung  cancer  lines,  5  immortalized  human  bronchial  epithelial  cell  strains  (HBECs),  and 
more  than  40  NSCLC  xenografts.  This  activity  will  be  extended  to  include  the  full  set  of  ~70 
NSCLC  lines  in  the  bank.  We  will  also  assess  the  molecular  profiles  of  40  heterotransplants  for 
validation  of  predictive  signatures.  It  is  worth  noting  that  we  found  few  differences  in  the  gene 
expression  profiles  between  NSCLC  cell  lines  grown  in  vitro  or  in  vivo  (either  subcutaneously  or 
orthotopically  in  the  lung);  in  an  unsupervised  clustering  analysis,  each  tumor  line  grouped  with 
itself  (tissue  culture,  subcutaneous,  or  orthotopic  xenograft)  rather  than  other  tumor  lines, 
illustrating  that  a  cell  line  can  be  linked  to  an  in  vivo  profile. 

Proteomic  profiling  of  NSCLC  cell  lines  and  tumors.  We  have  already  conducted  an  RPPA 
analysis  from  a  set  of  NSCLC  tumors  (Figure  1).  Using  a  panel  of  59  proteomic  markers, 
unsupervised  clustering  sorted  primary  lung  cancer  from  normal  lung  tissue  specimens.  A  five- 
marker  signature  was  able  to  identify  tumor  versus  normal  lung  (Figure  1A  and  B)  and 
squamous  versus 
adenocarcinoma 
histology.  We  also 
assessed  a  panel  of 
75  NSCLC  cell  lines 
grown  under  three 
media  conditions  and 
analyzed  for  ~150 
proteins  and 

phosphoproteins. 

RPPA  profiling  was 
able  to  separate 
distinct  subsets  of 
cell  lines,  normal 
lung  tissue,  and  lung 
and  HNSCC  by 
clustering  analysis. 

Interestingly,  a  group 
of  lung  cell  lines  was 
identified  that  was 
characteristically 
similar  to  the  HNSCC 
(Figure  1C).  This 

panel  of  cell  lines  was  also  correlated  with  in  vitro  drug  response,  which  identified  predictive 
signatures  of  response  (Figure  3).  In  the  next  grant  period,  the  proteomic  profiles  will  be 
extended  to  include  those  of  the  heterotransplant  models,  including  post-treatment  samples. 

Database  of  molecular  profiles  from  clinical  NSCLC  specimens,  including  tumors  from  the 

BATTLE-1  trial.  To  validate  the  signatures  derived  from  the  cell  line  panel,  we  will  leverage  the 
currently  available  molecular  profiles,  and  additional  ones  that  will  be  available  over  the  next  6 
months,  from  the  BATTLE-1  clinical  trial  as  well  as  the  more  than  100  NSCLC  tumors  profiled 
from  our  tumor  archives.  We  currently  have  global  gene  expression  profiling  data  in  tumor 
specimens  from  70  patients,  and  it  is  anticipated  that  data  from  at  least  70  more  will  become 
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Figure  1.  RPPA  profiling  separates  normal  from  NSCLC  tumor  in  patient  samples  and 
shows  separates  subsets  of  cell  lines  with  distinct  proteomic  profiles.  (A)  Unsupervised 
clustering  identifies  paired  normal  lung  (blue)  versus  NSCLC  tumor  (red)  by  RPPA 
markers.  (B)  From  the  full  set  of  RPPA  markers,  A  5-marker  signature  was  identified 
and  validated  that  separate  normal  (blue)  from  tumor  (red).  (C)  RPPA  profiling  using 
139  proteins  categorizes  230  cell  lines,  including  lung,  head  and  neck  (HNSCC),  and 
normal  (human  bronchial  epithelial  cell)  controls. 
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available  by  late  2009.  Among  these  tumors,  EGFR  mutations  were  observed  in  13  patients, 
and  KRAS  mutations 
were  detected  in  11 
patients. 


Baseline  gene 
expression  drug 
response  signatures. 

The  mRNA  profiles  for 
50  NSCLC  lines  were 
correlated  with  the 
various  drug  response 
phenotypes  to  derive 
signatures  predictive  of 
response  to  various 
drugs  including  erlotinib 
and  the  MEK  inhibitor 
AZD6244  (Figure  2).  As 
shown  in  the  figure,  the 
drugs  also  clustered  by 
their  general  mechanism 
of  action  (e.g.,  EGFR 
inhibitors  geftinib, 

erlotinib,  and  cetuximab 
together),  with  AZD6244 
having  a  distinct  profile. 


Proteomic  drug  response 

signatures. 

We  correlated  the  proteomic  profiles  with  in  vitro  drug  response  data  for  a  variety  of  drugs 
(Figure  3).  Sixty  cell  lines  were  tested  for  sensitivity  to  the  EGFR  inhibitor  erlotinib,  and  30  to 
AZD6244.  The  cell  lines  were  then  classified  into  "sensitive,"  "intermediate  sensitivity,"  and 
"resistant"  classes  based  on  IC50  values,  and  were  correlated  with  baseline  expression  of  ~150 
proteins  measured  by  RPPA.  Proteomic  signatures  of  in  vitro  response  to  EGFR  inhibition  by 
erlotinib  or  gefitinib,  or  to  MEK  inhibition  using  AZD6244,  were  derived  and  retested,  showing  a 
significant  correlation  with  drug  response.  For  erlotinib,  markers  of  sensitivity  included  EGFR 
itself,  FIER-2,  pi 6,  pSTAT3,  and  ERK1,  several  of  which  had  been  previously  identified; 
markers  associated  with  resistance  included  IGF-1R,  FOX03,  and  EMT  marker  N-cadherin.  The 
MEK  inhibitor  AZD6244  had  a  distinct  profile,  with  pSTAT3  and  pSRC  associated  with 
sensitivity,  and  pi 6,  p85  subunit  of  PI3K,  MEK2,  and  phosphoAMPK  associated  with  resistance. 
These  data  illustrate  that  this  approach  can  be  used  to  derive  predictive  proteomic  signatures. 

High  SRC-3  expression  correlates  with  EGFR-TKI  resistance. 

The  Steroid  Receptor  Co-activator  3  (SRC-3)  overexpression  was  correlated  with  resistance  to 
the  EGFR  tyrosine  kinase  inhibitors  cetuximab,  gefitinib,  and  erlotinib  by  proteomic  profiling  of 
NSCLC  cell  lines.  Subsequent  downregulation  of  SRC-3  by  siRNA  in  a  gefitinib-resistant 
NSCLC  cell  line  (HI 81 9)  restored  sensitivity  to  gefitinib,  which  then  induced  cell  death  (Figure 
4). 


Genes  (mRNA) 


Drugs 


AZD6244 . 


Erlotinib- 


Expression  Correlates 
with  Sensitivity 

- 1.0 


Expression  Correlates 
with  Resistance 


Pearson  r  +1.0 


Figure  2.  Unsupervised  clustering  of  mRNA  expression  signatures  predicting  sensitivity 
and  resistance  to  various  chemotherapy  agents  groups  drugs  by  mechanism  of  action. 
The  mRNA  expression  patterns  from  the  microarray  data  for  each  of  the  50  NSCLC  lines 
were  correlated  with  the  various  drug  response  phenotypes  to  derive  signatures 
predictive  of  response,  including  erlotinib  (blue  arrow)  and  AZD6244  (red  arrow).  The 
statistical  Pearson  rvalues  that  correlate  expression  of  the  individual  genes  with  drugs 
sensitivity  and  resistance  across  the  50  lung  cancer  lines  are  color-coded  with 
expression  levels  of  green  correlating  with  sensitivity,  and  expression  levels  of  red 
correlating  with  resistance. 
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Figure  3.  Proteomic  signatures  cluster  drugs  as  to  mechanism  of  action  and  predict  drug  response.  (A). 
Signatures  were  derived  from  NSCLC  panel  IC50  and  RPPA  data.  Pearson  correlation  values  are  calculated 
between  the  RPPA  expression  vectors  of  each  protein  (columns)  and  the  IC50  vectors  of  each  drug  (rows), 
across  all  NSCLC  lines.  Green  (blue  in  panel  B)  indicates  increased  expression  correlates  with  sensitivity,  red 
with  resistance.  Black  arrow  indicates  EGFR  inhibitors  gefitinib  and  erlotinib;  blue  arrow  AZD6474,  red  arrow 
MEK  inhibitor  AZD6244.  (B).  Proteomic  markers  that  predict  sensitivity  to  EGFR  inhibitors,  but  resistance  to 
AZD6244  (red  boxes).  (C,  D).  Protein  markers  that  predicted  response  were  identified  for  erlotinib  (27  markers) 
and  AZD6244  (22  markers,  panel  D).  Leave-one-out  cross-validation  comparing  predicted  IC50s  with  measured 
IC50s  demonstrated  a  good  performance  of  the  markers  for  each  drug. 
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Figure  4.  Inhibition  of  SRC-3  sensitizes  H1819  cells  to  gefitinib  and  induced  apoptosis.  Inhibition  of  SRC-3  by 
siRNA  sensitized  the  gefitinib-resistant  HI 81 9  cells  to  gefitinib,  which  then  induced  cell  death  as  depicted  by 
imaging  (A)  and  annexin  V  staining  (B). 
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Key  Research  Accomplishments 

•  Completed  protein  profiling  and  gene  expression  profiling  for  50  NSCLC  cell  lines. 

•  Derived  baseline  gene  expression  signatures  predictive  of  response  by  correlating  mRNA 
expression  with  drug  response. 

•  Derived  proteomic  drug  response  signatures  by  correlating  proteomic  profiles  with  drug 
response  data  for  a  variety  of  drugs. 

•  Using  baseline  proteomic  profiles,  markers  of  radiation  sensitivity  and  resistance  were 
identified  in  lung  cancer  cell  lines  (Yordy  et  al.,  ASTRO  2008;  Yordy  et  al.,  ASCO  2008). 

•  Identified  factors  associated  with  age  and  sex  differences  in  NSCLC  (Herynk  et  al., 
Proceeding  of  the  Flight  Attendants  Medical  Research  Institute,  2009;  Herynk  et  al., 
Proceedings  of  the  International  Association  for  the  Study  of  Lung  Cancer,  2009). 

•  Identified  SRC-3  as  a  potential  biomarker  of  response  to  the  EGFR  inhibitor. 

Conclusions 


RPPA  proteomic  profiling  and  gene  expression  profiling  for  a  large  number  of  cell  lines  was 
performed  and  has  provided  the  basis  for  identifying  intracellular  signaling  pathways  and 
proteins  associated  with  sensitivity  and  resistance  to  chemotherapies  and  targeted  agents  in 
NSCLC  cell  lines  and  tumor  samples.  These  profiles  will  allow  multiple  biomarker  analyses. 
One  of  the  identified  markers,  SRC-3,  was  found  to  be  correlated  with  resistance  to  EGFR 
inhibitors.  Inhibition  of  SRC-3  in  a  gefitinib-resistant  cell  line  was  able  to  reverse  resistance  to 
the  inhibitor.  These  results  show  that  the  model  is  successful  at  identifying  relevant  biological 
targets  that,  when  inhibited,  are  able  to  reverse  resistance  to  a  targeted  agent.  Our  findings  will 
be  further  investigated  by  correlating  RPPA  of  tumor  samples  with  clinical  outcomes  in  samples 
from  the  BATTLE-1  trial  and  other  clinical  samples  with  the  goal  of  developing  predictive 
markers  that  can  guide  treatment  selection  and  identify  new  targets  in  NSCLC. 


Project  2:  Tumor  molecular  profiles  in  patients  with  operable  non-small  cell  lung  cancer 

(NSCLC):  impact  on  stage,  prognosis,  and  relapse  pattern. 

(Leaders:  Drs.  David  Stewart,  Jack  Roth;  Co-Leaders:  Drs.  Roy  Herbst,  Edward  Kim,  Katherine 

Pisters,  Stephen  Swisher) 

Hypotheses: 

We  hypothesize  that: 

1.  In  tumors  from  patients  with  NSCLC,  patterns  of  co-expression  of  molecules  that  modulate 
cell  proliferation,  survival,  angiogenesis,  invasion,  metastasis  and  apoptosis  will 
substantially  influence  tumor  stage  and  size  at  the  time  of  diagnosis,  and  will  largely  define 
patient  prognosis. 

2.  Impact  of  adjuvant  and  neoadjuvant  therapies  on  disease-free,  progression-free,  and  overall 
survival  will  vary  across  prognostically  distinct  groups. 

3.  Specific  molecular  signatures  in  primary  tumors  will  predict  both  metastatic  patterns  at 
relapse  and  molecular  profiles  of  recurrent  tumors,  and  this  could  help  guide  adjuvant 
strategies  and  therapeutic  strategies  at  relapse. 
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Specific  Aims: 

Aim  1:  To  define  characteristic  TTF/gene  expression  profiles  of  prognostically  distinct 
subpopulations  of  patients  with  resectable  NSCLC,  and  to  assess  the  extent  to  which 
these  molecular  profiles  correlate  with  tumor  stage  and/or  size. 

The  main  goal  of  this  aim  is  to  use  150  archival  NSCLC  tumor  samples  from  our  tissue  bank 
(with  corresponding  clinical  data)  and  to  prospectively  collect  tumor  samples,  blood  samples, 
and  clinical  data  from  300  additional  patients  undergoing  surgical  resection  of  NSCLC.  The 
tissue  and  blood  samples  will  be  used  by  Project  3  and  the  Pathology  Core  to  generate 
comprehensive  TTF/gene  expression  molecular  profiles  using  methods  developed  in  Project  1. 
We  will  construct  Kaplan-Meier  estimated  survival  curves  for  disease-free  survival,  progression- 
free-survival,  and  overall  survival,  and  will  use  Cox  proportional  hazards  models  and  recursive 
partitioning  methods  to  identify  important  biomarkers  and  prognostically  distinct  subpopulations. 
We  will  also  correlate  TTF/gene  expression  molecular  profiles  with  initial  tumor  size  and  stage. 
In  addition,  we  will  explore  the  feasibility  of  using  nonlinear  regression  analyses  of  semilog  plots 
of  %  disease-free  survival,  %  progression-free  survival,  and  %  overall  survival  vs  time  to 
facilitate  identification  of  prognostically  distinct  subpopulations  with  characteristic  TTF/gene 
expression  molecular  profiles. 

Aim  2:  To  assess  the  impact  of  adjuvant  and  neoadjuvant  chemotherapy  on  disease-free 
survival,  progression-free  survival,  and  overall  survival  in  prognostically  distinct 
subgroups,  and  to  provide  tumor,  blood  and  clinical  data  to  Project  3  for  an  assessment 
of  factors  contributing  to  resistance  to  chemotherapy  and  to  Project  5  for  assessment  of 
profiling  of  EGFR  and  related  molecules  by  new  quantum  dot  technologies. 

Of  the  450  patients  included  in  the  project,  we  will  assess  100  new  prospectively  recruited 
patients  who  will  receive  neoadjuvant  therapy,  100  patients  who  will  receive  postoperative 
adjuvant  therapy  (including  approximately  20  tumor  bank  patients  and  80  new  patients),  and 
250  patients  who  did  not  receive  adjuvant  or  neoadjuvant  therapy  (including  approximately  130 
tumor  bank  patients  and  120  new  patients).  We  will  collect  patient  clinical  data  on  all  450 
patients  and  will  collect  blood  samples  on  the  300  new,  prospectively  recruited  patients.  Tumor 
and  blood  samples  and  clinical  data  will  be  provided  to  Project  3  for  studies  of  therapeutic 
resistance  and  to  Project  5  for  assessment  of  profiling  of  epidermal  growth  factor  receptor 
(EGFR)  and  related  molecules  by  new  quantum  dot  technologies,  while  in  Project  2  we  will 
assess  impact  of  adjuvant  and  neoadjuvant  therapy  on  outcome  in  each  prognostic  group. 

Aim  3:  To  correlate  TTF/gene  expression  molecular  profiles  in  the  primary  tumor  with 
metastatic  patterns  and  with  tumor  molecular  profiles  at  relapse. 

For  patients  who  relapse,  we  will  define  metastatic  sites  at  relapse,  obtain  tumor  tissues  from 
selected  patients  who  undergo  biopsies  to  confirm  relapse,  and  define  TTF/gene  expression 
molecular  profiles  in  the  patients’  original  primary  tumor  specimens  that  predict  sites  of  later 
relapse  (and  in  particular  that  predict  relapse  in  brain).  We  will  also  assess  whether  tumor  at 
relapse  is  enriched  for  particular  molecular  characteristics  that  may  promote  metastasis  when 
compared  to  the  primary  tumor,  and  will  assess  the  extent  to  which  TTF/gene  expression 
molecular  profile  at  diagnosis  may  help  guide  choice  of  therapies  at  relapse. 
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Summary  of  Research  Findings 

As  presented  in  further  detail  in  Project  3  and  the  Pathology  Core,  we  have  identified 
approximately  736  archival  tumor  samples  from  our  Tissue  Bank  that  match  eligibility  criteria  for 
inclusion  in  this  trial.  We  are  now  in  the  process  of  assessing  in  detail  the  quality  of  RNA,  DNA, 
and  protein  that  is  available  from  these  specimens  prior  to  making  a  final  selection  of  the  subset 
of  150  that  will  be  used  for  full  analysis  under  PROSPECT.  In  addition,  tissue  microarrays  have 
already  been  constructed  on  327  of  these  736  samples  and  we  have  completed  staining  each  of 
these  for  the  immunohistochemical  (IHC)  assessment  of  more  than  100  relevant  biomarkers. 
Initial  biostatistical  assessments  on  a  subset  of  markers  (cytoplasmic,  membrane  and/or  nuclear 
staining  for  CA  IX,  COX2,  CTR1,  DcR2,  DNMT1,  ERCC1,  HIF-la,  Ki67,  p14ARF,  p16  INK4a, 
p21  WAF1/CIP1,  p53,  RB,  pRB,  SHARP2,  SURVIVIN,  TGFp,  VEGF,  GLUT4,  RhoA,  Folate 
Receptor  alpha,  and  RFC1)  revealed  in  univariate  analysis  that  cytoplasmic  staining  for  HIF-la 
and  nuclear  staining  for  pRB  correlated  significantly  with  overall  survival.  Factors  correlating 
with  time  to  relapse  include  membrane  expression  of  CA  IX  and  pi  6.  Factors  correlating  with 
relapse  with  borderline  significance  (p<0.10)  included  ERCC1,  RB,  and  TGFp. 

Several  of  the  markers  correlated  significantly  with  the  stage  and  with  the  lung  cancer  type. 
Higher  N  stage  was  associated  with  significantly  decreased  expression  of  cytoplasmic  and 
nuclear  CTR1,  cytoplasmic  DNMT1,  and  cytoplasmic  RB.  Compared  to  squamous  cell 
carcinomas,  adenocarcinomas  had  significantly  higher  expression  of  TGFp,  CTR1,  cytoplasmic 
DNMT1,  cytoplasmic  ERCC1,  VEGF,  pi 6,  p14AR,  FOLR1,  and  RhoA  and  significantly  lower 
expression  of  SHARP,  CA  IX,  nuclear  DNMT1,  nuclear  ERCC1,  nuclear  RB,  SURVIVIN, 
p21WAF  and  p53.  High  expression  of  FOLR1  in  adenocarcinomas  is  of  interest  since  it  might 
explain  the  greater  efficacy  of  the  multitargeted,  antifolate  agent  pemetrexed  in 
adenocarcinomas  than  in  squamous  cell  carcinomas. 

Collection  of  prospective  tumor  samples  is  also  going  well.  Of  the  300  blood  and  tissue 
samples  proposed  over  the  course  of  the  project,  we  have  collected  291  tissue  samples 
between  August  2007  and  May  2009.  Blood  samples  have  been  collected  from  283  patients; 
both  blood  and  tissue  samples  have  been  collected  in  231  patients.  We  had  proposed  to  collect 
tissue  samples  from  100  patients  who  received  neoadjuvant  chemotherapy,  and  to  date  we 
have  collected  74.  Tissues  from  an  additional  100  neoadjuvant  patients  have  been  accessed 
from  our  preexisting  tissue  bank  specimens.  Hence,  we  are  ahead  of  schedule  on  specimen 
procurement  for  the  project. 

In  Project  3,  expression  profiles  in  tumors  from  patients  in  Project  2  who  received  neoadjuvant 
chemotherapy  will  be  compared  to  those  in  patients  who  did  not.  Tumors  surviving  neoadjuvant 
chemotherapy  will  be  regarded  as  a  model  of  acquired  resistance.  In  related  work,  we  found 
that  tumors  exposed  to  chemotherapy  or  targeted  therapies  within  the  previous  3  months  had 
decreased  expression  of  the  copper/platinum  transporter  CTR11,  suggesting  a  mechanism  by 
which  exposure  to  a  broad  range  of  agents  could  secondarily  lead  to  resistance  to  cisplatin  and 
carboplatin. 

In  last  year’s  report,  we  also  outlined  preliminary  work  that  had  been  performed  using 
exponential  decay  nonlinear  regression  analysis  of  patient  survival  plots,  and  conclusions  that 
had  been  drawn.  This  previous  work  defined  the  process  to  be  used  for  future  correlations  of 
the  biomarker  data  with  patient  outcomes. 
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Key  Research  Accomplishments 

•  Collected  tumor  specimens  on  291  lung  cancer  patients  (including  74  who  had  received 
neoadjuvant  chemotherapy). 

•  Collected  blood  samples  on  283  lung  cancer  patients  (including  64  who  received 
neoadjuvant  chemotherapy). 

•  Performed  preliminary  assessment  of  impact  of  18  biomarkers  on  survival,  and  their 
correlation  with  stage  and  tumor  type. 

Conclusions 


During  this  project  period,  we  identified  and  are  currently  assessing  the  quality  of  RNA,  DNA, 
and  protein  that  is  available  from  these  tumor  specimens  prior  to  the  full  analysis  under 
PROSPECT.  Specimen  collection  continues  at  a  brisk  pace  and  will  further  our  goal  of 
predicting  future  sites  of  relapse  by  examining  the  molecular  profiles  associated  with  the  patient 
tissues.  Further  analysis  is  needed  to  assess  the  extent  to  which  TTF/gene  expression 
molecular  profile  at  diagnosis  may  help  guide  choice  of  therapies  at  relapse. 


Project  3:  Molecular  Profiling  of  Non-Small  Cell  Lung  Cancer  Tissue  Specimens  and 
Serum  and  Plasma  Samples:  Correlation  with  Patient  Response  and  Tumor  Resistance 
to  Chemotherapy. 

(Leader:  Dr.  Ignacio  Wistuba;  Co-Leaders:  Lin  Ji  and  John  Minna) 

Hypothesis: 

In  Project  3,  we  hypothesize  that  systematic  molecular  profiling  of  surgically  resected  non-small 
cell  lung  cancer  (NSCLC)  tissue  specimens  using  therapeutic  target-focused  (TTF)  and  mRNA 
approaches,  along  with  serum  phosphopeptide  screening  and  plasma  DNA  analysis,  will  lead  to 
the  following  results: 

1 .  Validation  in  patients’  tissue  specimens  of  molecular  signatures  obtained  from  NSCLC  cell 
lines  that  are  associated  with  in  vitro  and  in  vivo  (xenograft)  resistance  of  NSCLC  cell  lines 
to  chemotherapeutic  and  targeted  agents. 

2.  Identification  of  molecular  profiling  signatures  associated  with  NSCLC  sensitivity  or 
resistance  to  chemotherapeutic  agents  that  can  identify  NSCLC  patients  most  likely  to 
respond  to  a  given  targeted  therapeutic  agent. 

3.  Development  and  validation  of  serum  phosphopeptide  profiles  and  plasma  DNA  markers 
associated  with  NSCLC  patient  response  and  tumor  resistance  to  chemotherapeutic  agents. 

Objectives: 

The  greatest  obstacle  to  creating  effective  treatments  for  lung  cancer  is  the  development  of 
resistance  to  both  chemotherapeutic  and  targeted  agents.  In  this  highly  integrated  and 
translational  program  project,  we  tackle  one  of  the  most  clinically  significant  problems  in  lung 
cancer:  the  prediction  of  patient  response  to  therapy,  especially  in  the  context  of  tumor 
resistance  to  current  standard  chemotherapies.  The  main  objectives  of  this  project  are  as 
follows: 

a)  To  profile  surgically  resected  tumor  tissue  specimens  obtained  from  NSCLC  patients  to 
validate  molecular  signatures  found  in  the  TTF  and  mRNA  profiles  developed  in  Project  1 . 
These  profiles  will  be  compared  with  molecular  signatures  obtained  from  NSCLC  cell  lines 
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that  are  associated  with  in  vitro  and  in  vivo  (xenograft)  resistance  to  chemotherapeutic  and 
targeted  agents. 

b)  By  comparing  NSCLC  tumor  specimens  (collected  in  Project  2)  from  patients  who  have 
received  preoperative  chemotherapy  and  from  those  who  have  not,  to  validate  TTF  and 
mRNA  signatures  that  are  found  in  Project  1  to  be  associated  with  resistance  to  therapy  and 
with  the  activation  of  resistance-associated  molecular  pathways  or  that  are  found  in  Project 
1  to  be  potentially  exploitable  as  new  therapeutic  targets. 

c)  To  identify  serum  and  plasma  biomarkers  as  surrogate  markers  to  predict  the  response  of 
NSCLC  patients  to  neoadjuvant  chemotherapy  and  to  predict  patient  outcome. 

d)  To  provide  tissue-  and  serum-based  molecular  profile  signatures  or  markers  to  Project  2  that 
can  predict  the  clinical  outcome  of  NSCLC  patients  who  had  undergone  surgical  resection 
with  curative  intent,  with  or  without  neoadjuvant  therapy. 

This  interdisciplinary  research  proposal  for  profiling  cell  lines,  tumor  tissue,  and  serum  samples 
from  NSCLC  patients  requires  extensive  histopathological,  molecular,  and 
immunohistochemical  studies,  which  will  be  coordinated  and/or  performed  by  the  Pathology 
Core  (see  Pathology  Core’s  report). 

Specific  Aims: 

Aim  1:  To  validate,  in  retrospectively  collected  NSCLC  tumor  tissue  specimens,  the  TTF 
and  mRNA  profiles  predictive  of  the  in  vitro  and  in  vivo  (xenograft)  resistance  of  NSCLC 
cell  lines  to  chemotherapeutic  and  targeted  agents. 

Summary  of  proposal:  We  will  select  150  surgically-resected  NSCLC  tumor  specimens  from  The 
University  of  Texas  Lung  SPORE  (UT-SPORE)  Tissue  Bank  for  TTF  and  mRNA  profiling.  Using 
those  150  frozen  archival  NSCLC  tumor  tissues,  we  will  perform  reverse-phase  protein  array 
(RPPA),  multiplex  bead-based  protein  analysis  (MBA)  and  Affymetrix  U133  Plus  2.0  array  to 
validate  the  molecular  signatures  developed  in  Project  1.  Then,  we  will  compare  the  profile 
signatures  obtained  from  the  NSCLC  tumor  specimens  with  the  signatures  obtained  from 
NSCLC  cell  lines  in  Project  1  that  predict  the  in  vitro  and  in  vivo  resistance  to  chemotherapeutic 
and  targeted  agents.  Finally,  using  formalin-fixed  and  paraffin-embedded  tissue  specimens,  we 
will  validate  the  expression  of  proteins  abnormally  represented  in  the  molecular  profiling 
analyses  of  NSCLC  tumor  specimens  by  using  tissue  microarrays  (TMAs)  and  semiquantitative 
immunohistochemical  (IHC)  methods. 

Summary  of  Research  Findings 

In  March  2009,  Dr.  Li  Mao  (former  Co-Leader)  left  the  institution  to  accept  a  position  at  the 
University  of  Maryland  at  Baltimore.  Dr.  Wistuba  has  accepted  the  task  of  taking  over  Dr.  Mao’s 
responsibilities  on  this  project  and  will  continue  to  provide  leadership  in  this  capacity. 

During  the  second  year  of  this  grant,  we  have  achieved  the  following:  1)  We  finalized  the 
selection  and  processing  of  all  surgically  resected  NSCLC  tissue  specimens  needed  for 
molecular  profiling  as  proposed  in  Aim  1;  2)  We  refined  the  profiling  plan  of  NSCLC  tissue 
specimens,  and  we  expanded  our  profiling  plans  to  include  miRNA  and  DNA;  3)  We  explored 
alternative  approaches  for  the  molecular  profiling  of  tissue  specimens,  such  as  formalin-fixed 
and  paraffin-embedded  (FFPE)  samples;  4)  In  collaboration  with  Project  4  (Dr.  A.  Tsao),  we 
performed  a  comprehensive  molecular  profiling  of  malignant  pleural  mesothelioma  (MPM)  tissue 
specimens  and  cell  lines.  The  detailed  progress  update  is  as  follows: 
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1.  Selection  and  processing  of  all  surgically  resected  NSCLC  tissue  specimens  needed 
for  molecular  profiling  as  proposed  in  Aim  1.  To  develop  molecular  signatures  (from  mRNA 
and  miRNA  profiles,  and  reverse  phase  protein  array,  RPPA)  in  NSCLC  specimens,  we 
extracted  RNA  and  DNA  from  frozen  tumor  and  normal  tissue  from  over  600  NSCLCs  with 
annotated  clinicopathologic  information,  including  outcomes  (recurrence-free  and  overall 
survival).  We  are  currently  in  the  process  of  selecting  250  stages  I  to  IIIA  surgically  resected 
NSCLCs  for  the  profiling  experiments.  Of  these,  125  cases  will  have  received  adjuvant 
chemotherapy  and  125  cases  will  not. 

a)  Detailed  histopatholoqical  analysis  of  NSCLC  frozen  tissue  specimens.  In  collaboration  with 
the  Pathology  Core,  detailed  histopathological  analysis  was  performed  using  a  technique 
developed  in-house  called  the  “shaving  method”  (please  see  Pathology  Core  report  for 
additional  detail).  This  technique  uses  5-pm-thick  haematoxylin-eosin  (H&E)-stained  histology 
sections  obtained  at  4  levels  of  the  tissue  specimen  that  are  alternated  by  two  sets  of  thirty  20- 
pm  thick  sections  obtained  for  DNA,  RNA,  and  protein  extractions.  For  detailed  histopathological 
analysis,  each  tumor  and  normal  H&E-stained  section  was  examined  by  an  experienced  lung 
cancer  pathologist  to  assess  the  percentage  of  tumor  vs.  adjacent  normal  tissues  and,  most 
importantly,  the  percentage  of  malignant  cells  vs.  tumor  non-malignant  stromal  (inflammatory, 
vascular  and  fibroblasts)  cells  and  normal  cells  present  in  the  adjacent  normal  tissue.  In 
addition,  tumor  cell  viability  has  been  addressed  by  examining  the  presence  of  necrosis  and 
hemorrhage.  Detailed  histopathological  analysis  was  performed  on  661  NSCLC  tumors.  These 
661  cases  represent  90%  of  736  NSCLC  cases  we  selected  from  the  UT-Lung  SPORE  Tissue 
Bank.  Paired  normal  and  tumor  samples  were  found  in  634  (96%)  of  cases.  Among  these  661 
tumor  cases,  353  contain  >70%  tumor  content  and  >50%  tumor  cell  content.  In  addition,  we  are 
in  the  process  of  digitalization  of  all  slides  for  future  comparisons  of  these  detailed 
histopathological  analyses. 

b)  DNA  and  RNA  extraction  from  NSCLC  frozen  tissue  specimens.  DNA  was  extracted  from 
1,294  samples,  including  773  tumor  and  613  normal  samples;  paired  normal  and  tumor  samples 
were  found  in  541  cases  (88%).  The  average  DNA  concentration  among  these  samples  was 
267  ng/pl  (0.9  -5.631  ng/ptl)  and  total  micrograms  obtain  from  extraction  was  102.8  pg  (0.1-3549 
pg).  RNA  was  extracted  from  1,302  samples,  which  include  550  tumors  with  76  duplicate 
samples.  These  represent  75%  of  736  NSCLCs  selected  from  our  UT-Lung  SPORE  Tissue 
Bank.  Paired  normal  and  tumor  samples  were  found  in  537  cases  (98%).  In  these  samples,  the 
average  RNA  concentration  was  776  ng/pl  (3.4-4758  ng/pl)  and  total  micrograms  obtained  from 
extraction  was  158  (0.3  -  3214  pg).  An  overview  of  the  characteristics  of  the  DNA  and  RNA 
extracted  is  shown  in  Table  1.  The  extraction  of  proteins  for  RPPA  is  pending,  and  will  be 
completed  during  the  third  year  of  this  grant;  however,  the  appropriate  laboratory  protocols  for 
this  work  have  been  developed. 

As  we  mentioned  in  our  previous  report,  a  large  variability  in  DNA  and  RNA  quantities  as 
expressed  in  micrograms  is  observed  in  the  NSCLC  tissue  specimens.  The  quality  of  RNA 
obtained  for  our  mRNA  (Affymetrix)  profiling  analysis  seems  reasonable  and  meets  our 
expectations.  The  average  RNA  Integrity  Number  (RIN;  Agilent  Bioanalyzer)  for  our  samples  is 
5.8  (Affymetrix  340  NSCLC  samples  have  recommended  RIN>5.0),  and  thus,  these  samples 
are  eligible  for  mRNA  Affymetrix  profiling  studies. 
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Table  1.  Summary  of  the  characteristics  of  the  DNA  and 
cases. 

RNA  extracted  from  387  NSCLC 

DNA 

RNA 

Tumor 

Average  SD 

Normal 

Average  SD 

Tumor  Normal 

Average  SD  Average 

SD 

Quantity  (|Kj) 
Concentration  (ng/|il) 
RNA  Integrity 

102.8  408.23 

267.29  879.68 

38.5  113.7 

227.5  638.4 

57.8 
688.9 

5.8 

104.2  80.3 

1182.4  539.6 

3.2  5.5 

151.9 

616.8 

2.6 

c)  Selection  of  cases  for  RNA  and  DNA  profiling.  More  than  300  cases  have  been  selected  for 
profiling  with  the  following  criteria:  a)  frozen  tumor  tissue  with  >70%  tumor  content  per  histology 
quality  control;  b)  frozen  tumor  tissue  with  >30%  of  malignant  cell  content;  c)  mRNA  RIN  >4; 
and,  d)  available  clinical  data  (adjuvant  therapy  status). 

2.  Profiling  plan  of  NSCLC  tissue  specimens.  We  have  defined  the  molecular  profiling 
analysis  to  be  performed  on  the  250  NSCLCs  selected  as  follows:  a)  miRNA  profiling  using  the 
Agilent  human  miRNA  microarray  Rel12.0  (Agilent  Technologies,  Inc.,  Santa  Clara,  California, 
USA);  b)  mRNA  Affymetrix  profiling  using  the  U133  Plus  2.0  chips  array;  c)  DNA  array 
comparative  genomic  hybridization  (aCGH)  using  the  244K  Agilent  array.  We  are  currently  in 
the  process  of  aliquotting  the  RNA  and  DNA  samples  for  the  various  profiling  platforms.  We  plan 
to  complete  all  these  molecular  profiling  during  the  third  year  of  the  grant. 

3.  Alternative  approaches  for  the  molecular  profiling  of  tissue  specimens.  Formalin-fixed 
paraffin-embedded  (FFPE)  samples  are  widely  available,  and  provide  valuable  sources  for 
study  molecular  basis  of  diseases  files  of  tumors  and  the  association  between  molecular 
changes  and  clinical  outcomes.  Due  to  the  degradation  and  chemical  alteration  that  occurs 
when  RNAs  are  extracted  from  FFPE  samples,  the  use  of  microarrays  for  gene  expression 
analysis  in  FFPE  samples  has  been  largely  hampered.  New  technology  and  methodologies 
have  been  developed  to  extract  RNA,  and  new  array  platforms  have  been  designed  to  measure 
gene  expression  in  FFPE  samples.  In  this  study,  we  have  shown  that  microarray  analysis  of 
FFPE  samples,  after  strict  quality  control  and  careful  data  processing,  can  be  used  to  build  a 
robust  prognosis  signature  for  NSCLC.  We  analyzed  75  FFPE  tumor  samples  from  NSCLCs. 
RNA  was  isolated  from  each  sample  using  Response  Genetics  kit  (Response  Genetics).  The 
Affymetrix  133  2+  microarray  platform  was  used  to  obtain  gene  expression  profiles. 

Major  findings.  A  set  of  1 ,400  genes  passed  the  FFPE  sample  microarray  data  quality  control 
criteria,  and  we  refer  to  this  gene  set  as  the  “robust  genes  set”  (RGS).  On  the  basis  of  the 
expression  of  these  robust  genes,  patients  could  be  divided  into  two  groups;  notably,  the  patient 
samples  in  Group  1  were  primarily  squamous  lung  cancer  (82%),  whereas  the  patient  samples 
in  Group  2  were  primarily  adenocarcinoma  lung  cancer  (93%;  P<0.0001). 

To  investigate  whether  the  two  groups  defined  by  RGS  expression  profiles  have  different  clinical 
prognoses,  we  drew  Kaplan-Meier  curves  for  both  overall  survival  (OS)  time  and  recurrence- 
free  survival  (RFS)  time  for  two  groups  (Figure  1).  Of  interest,  Group  1  showed  significantly 
shorter  OS  time  (median  survival  time  =  3  years)  compared  to  Group  2  (median  survival  time 
was  not  reached;  P=0.017  from  log-rank  test).  Group  1  also  had  shorter  RFS  time  (median 
PFS=2.4  year)  compared  to  Group  2  (median  RFS=  4.2  year;  P=0.09  from  log-rank  test).  The 
association  between  RGS  groups  and  death  or  disease  progression  was  independent  of  stage 
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(HR=4.38,  P=0.012  for  OS;  HR=2.15,  P  =0.059  for  RFS).  After  showing  the  strong  associations 
between  the  groups  defined  by  RGS  expression  profiles  and  the  clinical  outcomes,  we  explored 
whether  the  RGS  expression  profile  can  be  used  to  predict  the  lung  cancer  patients’  survival. 
First,  we  randomly  divided  55  patients  into  training  (25  samples)  and  testing  (30  samples)  sets. 
We  built  a  prediction  model  using  1,400  RGS  values  through  a  supervised  principle  component 

analysis  approach 
using  the  training 
data,  and  then 
validated  this 

prediction  model 
using  the  testing 
data.  We  found  that 
the  predicted  low- 
risk  group  has 
significant  longer 
survival  time  than 
the  predicted  high- 
risk  group  (median 
OS=2.78  years  for 
high-risk  group,  and 
median  OS  for  low- 
risk  group  was  not 
reached,  P=0.013). 
We  then 

demonstrated  that 
our  RGS  can  be 
used  to  train  and 
test  the  prediction 
models  in  frozen 

samples  using  one  of  the  largest  independent  lung  cancer  microarray  data  sets,  the  recently 
published  NCI  Director’s  Consortium  for  the  study  of  lung  cancer  that  included  442  resected 
NSCLCs.  Thus,  using  the  FFPE  signature,  we  predicted  that  the  low-risk  group  had  a 
significantly  longer  survival  time  than  the  predicted  high-risk  group  (median  OS=2  years  for 
high-risk  group,  and  median  OS  for  low-risk  group  =  4.5  years,  P=0.000013)  (Figure  1).  A 
manuscript  is  currently  in  preparation  with  these  data. 

4.  Comprehensive  molecular  profiling  of  malignant  pleural  mesothelioma  (MPM)  tissue 
and  cell  lines  specimens.  In  collaboration  with  Project  4  (Dr.  A.  Tsao),  we  performed  a 
comprehensive  profiling  analysis  of  53  MPM  tissue  specimens  and  5  MPM  cell  lines. 

a)  MPM  RNA  and  DNA  Extraction.  We  extracted  total  RNA  from  89  MPM  tissue  samples, 
representing  53  cases,  using  the  TRI  Reagent  (Applied  Biosystems,  Ambion,  USA)  according  to 
the  manufacturer's  instructions.  These  53  cases  include  36  cases  with  paired-normal  controls 
(adjacent  and  non-tumor  tissue)  and  comprise  epitheloid  (n=38),  biphasic  (n=8),  and 
sarcomatoid  (n=7)  histotypic  distribution  of  cases.  The  tumor  tissue  was  “shaved”  (see  previous 
description  of  method)  prior  to  extraction,  which,  based  on  our  preliminary  studies,  allows  higher 
yield  and  quality  of  RNA  to  be  obtained  as  well  as  facilitates  subsequent  analysis  of  the  tissue. 
The  histological  analysis  was  carried  out  by  a  pathologist  and  showed  that  about  68%  of  the 
samples  had  greater  than  70%  tumor  content.  The  RNA  was  quantified  using  the  Nanodrop- 
1000  spectrophotometer  (Nanodrop  Technologies,  Wilmington,  Delaware,  USA)  and  the  quality 
was  determined  on  the  RNA  Nano-chip  using  the  Agilent  2100  Bioanalyzer.  The 


A.  Clustering  analysis  of  77  NSCLC  tissues:  2  groups  were  identified 
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B.  Supervised  prediction  using  Supervised  Principle  Component 
[Training  =  33  cases]  and  [Testing=  34  cases] 


Survival  MDACC  -  Testing  FFPE 


Survival  -  Frozen  442  Adenocarcinomas 
Beeret  al,  Nat  Med  14:822-827 


Figure  1.  mRNA  profiling  of  77  NSCLC  FFPE.  Two  groups  of  NSCLC  cases  were 
identified  (Panel  A),  and  they  predicted  overall  survival  in  the  testing  set.  The  FFPE 
signature  also  predicted  survival  in  a  larger  dataset  of  frozen  NSCLCs. 
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spectrophotometric  analysis  showed  that  more  than  35%  of  the  samples  had  a  260/280  nm 
absorbance  ratio  of  equal  to  or  greater  than  2.0  with  an  average  yield  of  435  ng/ptl.  The  Nano¬ 
chip  determined  that  67%  of  the  samples  had  RIN  values  >5. 

b)  MPM  Messenger  RNA  profiling.  250  nanograms  of  total  RNA  from  each  of  the  89  samples 
were  sent  to  the  M.  D.  Anderson  Cancer  Center  MicroArray  Core  Facility  for  analysis  where 
they  were  labeled  via  the  double  in-vitro  transcription  (IVT)  protocol  and  hybridized  onto 
Affymetrix  U133  Plus  2.0  chips.  These  chips  determine  the  relative  expression  level  of  more 
than  47,000  transcripts  representing  most  of  the  human  genes.  The  Core  facility  scanned  the 
chips  and  has  delivered  the  data  to  Dr.  Kevin  Coombes  (Bioinformatics  Core)  for  subsequent 
analysis.  Preliminary  analysis  of  these  samples  using  Principal  Component  Analysis  (PCA)  in 
GeneSpring  GX  10  software  (Agilent  Technologies,  Inc.,  Santa  Clara,  California,  USA)  showed 
distinct  differences  between  the  normal  (red)  and  tumor  tissue  (blue)  (Figure  2). 


c)  MPM  MicroRNA  profiling. 

The  same  set  of  89  samples 
that  were  profiled  for  mRNA 
was  also  profiled  in  Dr. 
Wistuba’s  lab  for  microRNA 
content  using  the  Agilent 
human  miRNA  microarray. 
Human  miRNA  Microarray 
Rel12.0  arrays  contain 
approximately  866  human  and 
89  human  viral  miRNAs,  which 
represent  the  complete  content 
sourced  from  the  Sanger 
miRBase  v  12.0.  Slides  were 
scanned  on  an  Agilent 
microarray  scanner  (model 
G2565A)  at  100%  sensitivity 
and  5  micron  settings  at  the 
U.T.M.D.A.C.C  Genomics 
Core  facility.  Feature 
Extraction  software  version 
10.5.1  was  used  for  image  analysis  and  quality  assessment  to  obtain  primary  data,  which  was 
further  analyzed  for  data  reduction  and  cluster  analysis  using  the  GeneSpring  GX  version  10. 

d)  DNA  profiling.  DNA  was  isolated  from  tissue  shavings  using  the  DNAzol  Reagent  (Molecular 
Research  Center,  Inc,  Cincinnati,  OH,  USA).  As  with  the  RNA,  spectrophotometric  analysis  was 
used  to  determine  the  quantity  and  purity  of  the  samples,  while  quality  was  assessed  on  the 
DNA  chip  using  the  Agilent  2100  Bioanalyzer.  The  spectrophotometric  analysis  showed  that 
more  than  37%  of  the  samples  had  a  260/280  nm  absorbance  ratio  of  equal  to  or  greater  than 
1.8  with  an  average  yield  of  150  ng/pl.  The  DNA  chip  determined  that  99%  of  the  samples  had 
molecular  weight  greater  than  10  kilobases.  Of  the  53  cases,  about  47  tumor  cases  will  be 
analyzed  for  SNP  and  copy  number  variations  using  the  Human  IM-duo  platform  (llumina,  Inc., 
San  Diego,  CA,  USA),  which  contains  more  than  1  million  SNP’s  along  with  copy  number 
variation  content. 

e)  Protein  Profiling.  We  are  in  the  process  of  extracting  proteins  from  these  89  samples  for 
Reverse  Phase  Protein  Array  (RPPA)  analysis  using  the  protocol  obtained  from  Dr.  John 


Figure  2.  Preliminary  analysis  of  these  samples  using  Principal 
Component  Analysis  (PCA)  in  GeneSpring  GX  10  software  showed 
distinct  differences  between  the  normal  (red)  and  MPM  tumor  tissue 
(blue). 
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Algorithm:  Principal  Components  Analysis 
Parameters: 

Column  indices  =  [2-97] 

Pruning  option  =  [numPrincipalComponents, 

Mean  centered  =  true 
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PCA  on  =  Columns 
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Heymach’s  lab  (Project  1).  The  protein  lysates  will  be  printed  with  the  help  of  Dr.  Heymach’s  lab 
for  the  RPPA  analysis. 

f)  MPM  Cell  line  Profiling  Update.  We  have  acquired  from  various  sources  about  17  mesothelial 
and  mesothelioma  cell  lines  with  a  good  distribution  of  different  histotypes,  including  at  least  4 
epitheloid,  2  biphasic,  and  2  sarcomatoid  morphological  types.  Of  these,  six  cell  lines  have 
already  been  profiled  for  messenger  RNA  and  miRNA  content.  Other  cell  lines  are  awaiting  their 
characterization  as  genuine  mesothelioma  cell  lines.  DNA  from  these  cell  lines  has  been  sent  to 
the  MDACC  Microarray  Core  facility  for  SNP  and  copy  number  analysis  on  the  Affymetrix  SNP 
6.0  platform.  Additionally,  we  have  obtained  protein  lysates  from  these  cell  lines  to  print  RPPAs 
in  Dr.  Heymach’s  laboratory  (Project  1). 

To  explore  the  role  of  epigenetically  mediated  up-regulation  of  miRNAs  in  MPM,  we  performed 
pharmacological  unmasking  of  miRNA  expression  in  cell  lines.  miRNAs  have  emerged  as  key 
players  in  human  carcinogenesis.  Recently,  studies  have  shown  that  some  miRNAs  can  be 
epigenetically  up-regulated  by  aberrant  hypermethylation  in  human  cancer.  Five  cell  lines, 
including  one  normal  mesothelial  (Met5A)  and  five  MPMs  (epitheliod  H2452,  biphasic  H211  and 
unclassified  H28  and  H2052)  were  treated  in  vitro  with  the  demethylating  agent  5-aza-cytidine 
(5-Aza;1  uM)  and  SAHA  (2.5  uM)  for  96  hrs.  After  RNA  extraction  (Trizol),  miRNA  profiling  was 
performed  by  Agilent  human  microRNA  kit  v2.  A  total  of  299  (51%)  miRNA  were  up-regulated 
(two-fold)  after  the  treatment  in  a  normal  mesothelial  Met5A  cell  line,  but  fewer  miRNAs  were 
upregulated  in  the  malignant  cell  lines:  171  (29%)  in  H2452,  79  (13.5%)  in  H211,  55  (9.4%)  in 
H28,  and  56  (9.6%)  in  H2052.  We  detected  167  (55.9%)  miRNAs  that  were  exclusively  up- 
regulated  in  Met5A,  56  (32.7%)  in  H2452,  21  (26.6%)  in  H  211,  16  (29.1%),  in  H28,  and  18 
(32.1%)  in  H2052.  Among  all  unique  miRNA,  only  17  (let-7b,  let-7c,  let-7f-2,  miR-302c,  miR-328, 
miR-510,  miR-125b-1,  miR-16-1,  miR-223,  miR-302b,  miR-383,  miR-551b,  miR-922,  miR-148a, 
miR-18b,  miR-302d,  miR-326)  have  been  previously  associated  with  human  carcinogenesis. 
Interestingly,  one  miRNA  (miR-148a)  has  been  associated  with  a  microRNA  tumor  metastasis 
signature.  The  number  of  total  and  unique  miRNA  upregulated  after  5-Aza  and  SAHA  was  lower 
in  MPM  cell  lines  compared  with  normal  Met5A  cell  line.  Up-regulation  of  unique  miRNAs  was 
found  to  be  associated  with  cell  lines  obtained  from  some  specific  subtypes  of  MPM.  The 
identification  of  metastasis-associated  miR-148a  suggests  a  potential  biomarker  for  metastasis 
in  this  highly  malignant  neoplasm. 

Aim  2:  To  develop  TTF  and  mRNA  signatures  of  NSCLC  resistance  to  chemotherapy,  and 
identify  chemoresistance-associated  targets/pathways  as  new  therapeutic  targets. 

Summary  of  proposal:  Whereas  Aim  1  focuses  on  the  identification  in  archived  tumor  specimens 
of  TTF  and  mRNA  molecular  profiles  detected  in  NSCLC  cell  lines,  the  main  focus  of  Aim  2  is  to 
determine  whether  the  molecular  signatures  in  the  tumor  specimens  correlate  with  patient 
response  to  neoadjuvant  chemotherapy.  From  the  clinical  trial  in  Project  2,  we  will  use 
specimens  from  100  NSCLC  patients  who  received  neoadjuvant  therapy  and  had  surgical 
resection  with  curative  intent  (cases)  and  from  200  NSCLC  patients  who  had  surgical  resection 
but  did  not  receive  neoadjuvant  therapy  (controls)  to  perform  RPPA,  MBA,  and  Affymetrix  U133 
Plus  2.0  array  analyses.  Then,  we  will  compare  the  TTF  and  mRNA  profile  signatures  obtained 
from  these  NSCLC  tumor  specimens  with  signatures  obtained  in  Project  1  to  predict  the  in  vitro 
and  in  vivo  resistance  of  NSCLC  cell  lines  to  therapy.  Those  data  will  be  provided  to  Project  2 
for  correlation  with  clinical  characteristics,  including  prognosis  and  metastasis.  Finally,  using 
formalin-fixed  and  paraffin-embedded  tissue  specimens,  we  will  validate  the  expression  of 
proteins  abnormally  represented  in  the  molecular  profiling  analyses  in  NSCLC  tumor  specimens 
from  all  patients  enrolled  in  Project  2  by  using  TMAs  and  semiquantitative  IHC  methods. 
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Summary  of  Research  Findings 

During  the  second  year  of  this  program,  in  collaboration  with  the  Pathology  Core,  we  have 
mainly  focused  on  the  identification,  characterization,  and  processing  of  tissue  specimens  from 
surgically  resected  NSCLC  obtained  from  patients  who  have  received  neoadjuvant 
chemotherapy.  From  the  736  NSCLC  cases  selected  in  the  first  year,  we  have  identified  147 
(20%)  patients  who  have  received  neoadjuvant  chemotherapy.  Our  goal  to  obtain  100  NSCLC 
cases  with  neoadjuvant  therapy  has,  therefore,  been  reached. 

Selection  of  prospectively  collected  cases:  Since  the  activation  of  the  PROSPECT  laboratory 
protocol  on  August  2007,  the  Pathology  Core  has  collected  fresh  and  FFPE  tissue  specimens 
from  79  cases  that  have  received  neoadjuvant  chemotherapy.  Our  goal  to  obtain  100 
prospectively  collected  NSCLC  cases  with  neoadjuvant  treated  group  was  also  reached  this 
year. 

Processing  of  the  tissues  and  molecular  profiling  (mRNA  and  protein):  These  experiments  will 
begin  after  specimen  profiling  in  Aim  1  is  completed.  We  expect  to  initiate  these  experiments  by 
the  end  of  the  third  year  of  the  grant. 

Additional  tissue  sets  for  profiling  studies:  As  reported  last  year,  Dr.  Li  Mao  signed  a 
collaborative  agreement  with  the  Intergroupe  Francophone  de  Cancerologie  Thoracique  (IFCT) 
to  obtain  up  to  250  frozen  lung  tumor  tissues  from  patients  enrolled  in  IFCT-0002  clinical  trial  (a 
open-labelled,  multicentric,  randomized  phase  III  study),  which  was  designed  to  define  the  best 
timing  of  neoadjuvant  chemotherapies.  These  samples  will  be  used  to  identify  a  gene 
expression  signature  of  resistance  to  platinum-based  chemotherapies.  Samples  of  170  of  these 
cases  were  received  in  Dr.  Mao’s  laboratory,  and  are  now  going  through  histology  quality 
control  process.  As  Dr.  Mao  has  recently  relocated  to  another  institution,  Dr.  Wistuba  has 
assumed  responsibility  for  these  analyses. 

Pathological  analysis  of  NSCLC  response  to  neoadjuvant  therapy.  In  collaboration  with  Dr. 
Wistuba,  Drs.  Abujiang  Pataer  and  Stephen  G.  Swisher  from  the  Departments  of  Thoracic  and 
Cardiovascular  Surgery  at  M.D.  Anderson  Cancer  Center  have  initiated  a  study  on  the 
assessment  of  the  pathological  response  to  chemotherapy  in  NSCLC.  Our  main  goal  is  to 
determine  whether  pathologic  and  radiological  features  can  predict  response  after 
chemotherapy  of  lung  cancer  and  identify  the  potential  biomarkers  that  possible  to  assist  in  the 
selection  of  patients  for  specific  therapies  in  the  future.  The  identification  of  genes  involved  in 
chemo-resistance  may  also  allow  development  of  novel  therapies  to  enhance  the  clinical 
efficacy  of  chemotherapy.  We  plan  to  accomplish  the  following  three  specific  goals:  (1) 
Evaluate  surgically  resected  patients  treated  with  neoadjuvant  chemotherapy.  (2)  Determine 
whether  pathologic  or  radiologic  criteria  of  chemotherapy  response  are  associated  with  long¬ 
term  survival.  (3)  Determine  the  role  of  biomarkers  with  chemotherapy  response. 

Histological  recognition  of  cases  in  which  neoadjuvant  therapy  has  been  given  and  the 
description  of  the  extent  of  the  residual  tumor  will  become  increasingly  important  in 
prognostication  and  in  evaluating  postoperative  therapeutic  options.  We  identified  147  patients 
from  2001-2005  in  which  neoadjuvant  therapy  was  received  before  surgical  resection.  The 
clinical  and  pathological  information  have  been  obtained  in  all  cases.  To  refine  histological 
parameters  for  tumor  regression  and  describe  patterns  of  tumor  reaction  to  therapy,  we 
collected  133  formalin-fixed  and  paraffin-embedded  tissues  specimens  from  the  Thoracic 
Malignancy  Tissue  Bank  and  PROSPECT  Pathology  Core.  In  those  specimens,  129  cases  have 
frozen  tissue.  Histological  patterns  of  treatment-induced  tumor  regression  were  analyzed 
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included  viable  tumor,  necrosis,  fibrosis,  mixed  inflammatory  infiltrate,  foamy  macrophages,  and 
giant  cells.  Figures  3A,  B,  C,  and  D  show  typical  examples  of  the  histopathology  of  tumors 
associated  with  no  or  extensive  response  to  treatment.  In  most  tumors,  fibrosis  was  present;  in 
some  cases  fibrosis  was  the  predominant  manifestation  of  treatment  response  (Figure  3C).  In 
some  response  cases,  the  foamy  macrophages  were  associated  with  multinucleated  giant  cells 
with  cholesterol  clefts  (Figures  3C  and  D).  We  then  determined  the  size  reduction  by 
radiological  assessment  in  100  cases.  Figures  3E,  F,  G,  and  FI  show  typical  examples  of  the 
radiology  of  tumors  associated  with  no  or  extensive  response  to  treatment.  In  some  tumors,  the 
induction  of  radiological  size  was  observed  (Figures  3E  and  F).  The  56%  radiological  size 
reduction  was  recorded  in  different  cases  (Figures  3G  and  FI).  We  next  will  analyze  the 
correlation  of  pathological  or  radiological  features  with  patient  outcomes.  We  will  construct 
Kaplan-Meier  estimated  survival  curves  for  disease-free  survival,  progression-free  survival,  and 
overall  survival,  and  we  will  use  Cox  proportional  hazards  models  in  our  study.  We  will 
determine  the  correlation  between  extent  of  visible  cancer  cells  or  fibrosis  and  radiological 
estimate  of  size  reduction.  We  will  identify  chemo-resistant  and  chemo-sensitive  groups  based 
on  radiological  or  pathological  features  for  biomarker  discovery. 


Chemoresistance  Chemosensitive 

(>60%  Viable  tumor)  (<30%  Viable  tumor) 


Figure  3.  Typical  examples  of  the  histopathology  and  radiology  of  tumors  associated  with  no  response  (A,  B,  E  and  F)  or 
extensive  response  to  treatment  (C,  D,  G  and  H). 

In  addition,  during  the  second  year  of  the  grant,  we  have  developed  3  additional  projects  to 
investigate  in  NSCLC  novel  biomarkers  related  or  potentially  related  to  resistance  to 
chemotherapy.  These  studies  are  the  following:  a)  Expression  of  Keapl  and  Nrf2  in  NSCLC;  b) 
Expression  of  cell  membrane  receptors  in  NSCLC;  and  c)  Expression  of  cancer  stem  cell 
markers  in  NSCLC.  The  ultimate  goal  is  to  test  if  the  expression  of  these  markers  associates 
with  resistance  to  chemotherapy  in  this  disease.  A  brief  description  of  these  projects  and  the 
major  findings  includes  the  following: 

a)  Keapl  and  Nrf2  expression  in  NSCLC  correlates  with  clinicopatholoqical  features.  Nuclear 

factor  erythroid-2  related  factor  2  (Nrf2)  is  a  transcription  factor  associated  with  in  vitro 
resistance  to  chemotherapy.  Kelch-like  ECFI-associated  protein  1  (Keapl)  is  a  cytoplasmic 
repressor  of  Nrf2.  KEAP1  inactivation  is  a  relatively  frequent  genetic  alteration  in  NSCLC,  and 
leads  to  Nrf2  activation.  We  investigated  the  IHC  expression  of  nuclear  Nrf2  and  cytoplasmic 
Keapl  proteins  in  304  surgically  resected  NSCLC  tissues  in  tissue  microarrays 
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(adenocarcinomas,  n=190;  squamous  cell  carcinomas,  n=114)  (Figure  4).  We  correlated  those 
findings  with  patients’  clinicopathological  features  and,  in  adenocarcinomas,  with  EGFR  and 
KRAS  mutations.  We  also  examined  the  expression  of  Nrf2  and  Keapl  using  whole  tissue 
sections  in  79  NSCLC  tumors  (36  chemo-naive  and  43  treated  with  neoadjuvant 
chemotherapy).  We  detected  Nrf2  expression  in  26%  (77/299)  of  NSCLCs,  and  expression  was 
significantly  higher  in  squamous  cell  carcinoma  (43/112,  38%)  compared  with  adenocarcinoma 
(34/188,  18%;  P=0.0001).  In  adenocarcinomas,  Nrf2  was  not  expressed  in  EGFR  mutant  (0/23) 
compared  with  wild-type  tumors  (31/145,  21%;  P=0.009).  Keapl  expression  score  was 
significantly  higher  in  squamous  cell  carcinoma  compared  with  adenocarcinoma  (P<0.0001).  In 
patients  with  NSCLC  stage  I/ll,  who  did  not  receive  adjuvant  or  neoadjuvant  treatment,  Nrf2 
overexpression  significantly  correlated  with  poor  overall  survival  in  multivariate  analysis 

(HR=2.468;  95%  Cl  1.468, 
4.151;  P=0.0007).  In  patients 
with  squamous  cell  carcinoma 
histology,  low  Keapl 
expression  correlated  with 
poor  overall  survival 
(HR=0.479;  95%  Cl  0.260, 
0.882;  P=0.018).  KEAP1 

mutation  (exons  2-5)  was 
detected  in  1/20  tumors 
examined.  Normal  bronchial 
epithelia  adjacent  to  NSCLC 
tumors  did  not  show  Nrf2 
expression,  suggesting  that  a 
field-effect  phenomenon 

related  to  Nrf2  expression  was 
not  present.  We  conclude  that: 
1 )  increased  expression  of 
Nrf2  and  decreased 
expression  of  Keapl  are 
relatively  frequent 

abnormalities  in  NSCLC, 
especially  in  squamous  cell 
carcinoma  histology;  and  2) 
altered  IHC  expression  of  these  markers  correlates  with  NSCLC  patients’  outcome.  The 
identification  of  the  subset  of  patients  with  abnormal  expression  of  Nrf2  may  be  important  for 
better  selection  of  treatment  in  NSCLC. 

b)  IHC  expression  of  membrane  transporters  correlates  with  histology  of  NSCLC.  Folate 

receptor  alpha  (FOLR1),  reduced  folate  carrier  1  (RFC1),  copper  transporter  receptor  1  (CTR1), 
glucose  4  (GLUT4)  and  RHOA  regulate  uptake  of  molecules  and  drugs  inside  the  cell.  FOLR1 
and  RFC1  are  overexpressed  in  epithelial  tumors  and  are  potential  therapeutic  targets  and 
tumor  biomarkers.  IHC  protein  expression  of  FOLR1,  RFC1,  CTR1,  GLUT4  and  RHOA  was 
examined  in  320  surgically  resected  NSCLCs  placed  in  tissue  microarrays,  including  202 
adenocarcinomas  and  110  squamous  carcinomas,  and  correlated  with  patients’  clinico¬ 
pathological  characteristics.  A  semi-quantitative  IHC  score  was  obtained  assessing  the  intensity 
of  immunostaining  and  percentage  of  positive  tumor  cells.  The  pattern  of  IHC  expression  varied 
in  malignant  cells,  with  FOLR1,  RFC1  and  GLUT4  expressed  in  the  membrane  and  cytoplasm, 
CTR1  expressed  in  the  cytoplasm  and  nucleus,  and  RHOA  expressed  only  in  the  cytoplasm.  In 
all  cases,  expression  in  tumor  cells  was  higher  than  in  non-malignant  lung  epithelial  cells. 


Figure  4.  Representative  example  of  Keapl  and  Nrf2  protein  expression 
by  IHC  in  a  squamous  cell  carcinoma  of  the  lung  harboring  a  KEAP1 
mutation  (Panel  A).  Kaplan-Meier  curve  showing  overall  survival  analysis 
of  NSCLC  by  IHC  nuclear  expression  of  Nrf2  (Panel  B). 

Keapl  Low  Expression  Nrf2  Nuclear  Expression  KEAP1  Mutation 


B.  Five  Year  Overall  Survival  by  Nrf2  Nuclear  Expression  in  NSCLCs 


21 


Army  Award  W81XWH-07-1-0306;  Waun  Ki  Hong,  M.D. 

Annual  Report:  Reporting  Period  01  June  2008  -  31  May  2009 


Tumor  stromal  IHC  expression  was  frequently  detected,  especially  in  endothelial  cells, 
lymphocytes,  macrophages,  and  fibroblasts.  Adenocarcinomas  showed  significantly  higher 
expression  compared  with  squamous  cell  carcinoma  for  most  markers,  including  membrane 
(P<0.001)  and  cytoplasmic  (P<0.001)  FOLR1,  cytoplasmic  (P<0.001)  and  nuclear  (P<0.004) 
CTR1,  and  cytoplasmic  RHOA  (P<0.001).  Female  NSCLC  patients  had  significantly  higher 
expression  of  membrane  and  cytoplasmic  FOLR1  (P=0.01)  compared  with  male  patients. 
Smoking  patients  demonstrated  significantly  lower  expression  of  membrane  (P<0.001)  and 
cytoplasmic  FOLR1  (P<0.002),  and  higher  expression  of  membrane  (P=0.04)  and  cytoplasmic 
(P= 0.03)  GLUT4,  and  membrane  RFC1  (P=0.01)  when  compared  with  never-smokers.  In 
adenocarcinomas,  the  presence  of  EGFR  mutations  correlated  with  higher  expression  of 
membrane  FOLR1  (P<0.002),  and  KRAS  mutation  with  higher  expression  of  membrane  GLUT4 
(P<0.004)  and  lower  expression  of  nuclear  CTR1  (P=0.02).  We  conclude:  1)  membrane 
transporters  proteins  are  overexpressed  in  NSCLC  compared  to  normal  lung  epithelium;  2) 
significant  differences  were  found  between  adenocarcinomas  and  squamous  lung  cancer  in 
both  tumor  cells  and  the  tumor  microenvironment;  and  3)  differences  were  found  in  tumors  of 
males  and  females,  between  tumors  from  never-  and  ever-smokers,  and  between  tumors  with 
EGFR  or  KRAS  mutations.  The  different  patterns  of  transporter  expression  may  explain  the 
superior  response  of  NSCLC  patients  with  adenocarcinoma  histology  to  pemetrexed. 

c)  Expression  of  stem  cell  markers  in  NSCLC  and  correlation  with  clinicopatholoqic  features. 
Cancer  stem  cells  (CSCs)  represent  a  minority  population  of  self-renewing  tumor  cells  that  are 
believed  to  play  an  important  role  in  tumor  development  and  metastasis,  and  in  resistance  to 
therapy.  Although  some  CSC  markers  have  been  described  in  NSCLC,  no  comprehensive 
characterization  of  multiple  CSC  markers  has  been  undertaken  in  this  disease.  It  has  been 
hypothesized  that  the  CSCs  may  be  responsible  for  tumor  resistance  to  therapy.  Our  aim  was  to 
investigate  the  pattern  of  protein  expression  of  a  panel  of  CSC-related  markers  in  a  large  series 
of  NSCLCs,  and  to  correlate  those  findings  with  patients’  clinicopathologic  characteristics.  We 
examined  protein  expression  by  IHC  of  287  NSCLCs  (178  adenocarcinomas,  and  109 
squamous  cell  carcinomas,  SCC)  with  a  panel  of  seven  CSC  markers:  EZH2,  SOX2,  CD24, 
CD44,  C-kit,  BMI-1  and  Oct3/4.  The  pattern  of  expression  of  these  markers  was  correlated  with 
patients’  and  tumors’  clinicopathologic  characteristics,  including  outcome  of  the  disease.  In 
adenocarcinomas,  CSC  markers  expression  was  correlated  with  the  EGFR  and  KRAS  mutation 
status  of  the  tumors.  Expression  of  EZH2,  SOX2,  CD44,  CD24  and  C-kit  was  detected  in  a 
subset  of  NSCLC  tumors,  and  no  expression  of  BMI-1  and  Oct3/4  was  detected  in  any  tumor 
specimen.  The  pattern  of  expression  for  these  markers  varied  according  to  NSCLC 
clinicopathologic  characteristics,  including  tumor  histology  and  pathological  stage,  and  patients’ 
smoking  history.  Both  EZH2  and  SOX2  nuclear  protein  expression  were  significantly  higher  in 
SCC  than  adenocarcinoma  (P<0.001).  Conversely,  CD44  membrane  (P<0.001)  and  CD24 
cytoplasmic  (P< 0.05)  expression  were  significantly  higher  in  adenocarcinoma  than  in  SCC.  We 
identified  a  subset  of  NSCLCs  having  membrane  CD44  high/CD24  low  or  negative  expression. 
In  adenocarcinomas,  EZH2,  CD44,  and  CD24  expression  levels  were  significantly  (P  <0.001) 
higher  in  current  smokers  than  never  or  former  smokers.  The  presence  of  EGFR  mutation  in 
lung  adenocarcinomas  correlated  significantly  with  low  EZH2  (P=0.03)  and  high  CD44 
(P=0.032)  membrane  expression.  Interestingly,  in  multivariate  analysis  and  examining  the 
expression  scores  as  continuous  variables,  high  nuclear  expression  of  EZH2  correlated 
significantly  with  worse  recurrence-free  survival  (HR=1.006;  P=0.0035)  and  overall  survival 
(HR=1.005;  P=0.0202)  in  stages  I/ll  lung  adenocarcinoma.  We  thus  have  provided  a 
characterization  of  multiple  CSC  markers  in  a  large  series  of  NSCLCs.  Our  findings  indicate  that 
a  different  pattern  of  CSC  markers  expression  is  detected  in  adenocarcinomas  and  squamous 
cell  carcinomas  of  the  lung,  and  their  expression  correlates  with  patients’  clinicopathologic 
features,  including  survival.  The  understanding  of  the  role  of  CSC  in  NSCLC  tumor  development 
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and  progression  may  provide  opportunities  to  design  novel  strategies  to  prevent  and  treat  this 
disease. 

Aim  3:  To  identify  surrogate  serum  phosphopeptide  profiles  and  plasma  DNA  markers 
associated  with  NSCLC  tumor  resistance  and  patient  response  to  neoadjuvant 
chemotherapy. 

We  will  identify  serum  samples  from  the  UT-SPORE  Tissue  Bank  that  match  the  NSCLC  tumor 
resection  specimens  examined  in  Aim  1 .  We  will  use  these  serum  samples  for  phosphopeptide 
profiling  and  peptide  mapping  by  ProteinChip  array-based  surface-enhanced  laser-desorption- 
ionization  (SELDI)  mass  spectrometry  (MS)  and  laser  desorption/ionization  (LDI)  mass 
spectrometry  (MS)/MS  to  compare  serum  phosphopeptides  with  TTF  and  mRNA  profiles.  The 
phosphopeptide  MS  profiles  from  retrospective  specimens  will  later  be  used  as  references  and 
controls  for  the  prospective  serum  proteomic  analysis.  As  in  Aim  2,  we  will  use  serum  samples 
collected  prospectively  in  Project  2  from  100  NSCLC  cases  undergoing  neoadjuvant 
chemotherapy  and  200  NSCLC  controls  undergoing  surgery  without  neoadjuvant 
chemotherapy,  and,  when  relevant,  at  the  time  of  relapse.  Using  these  serum  specimens,  we 
will  perform  phosphopeptide  profiling  on  ProteinChip  arrays  by  SELDI-MS  to  measure  the 
temporal  changes  in  serum  phosphopeptides  before  and  after  the  therapeutic  intervention.  We 
will  use  LDI-QSTAR-MS/MS  and  liquid  chromatography  (LC)-MS/MS  to  identify  specific  serum 
phosphopeptides  that  are  determined  by  SELDI-MS  to  be  relevant  to  targeted  therapeutic 
response  and  acquired  resistance  in  lung  cancer  patients.  In  addition,  we  will  compare  serum 
phosphopeptide  profiles  with  TTF  (RPPA  and  MBA)  profiles,  mRNA  profiles,  and  TMAs  and  IHC 
analysis  developed  in  Project  1  and  in  Aims  1  and  2  of  this  project.  This  comparison  will  identify 
TTF  serologic  molecular  signatures  and  elucidate  the  biologic  pathways  potentially  associated 
with  patient  response  and  tumor  resistance  to  targeted  therapeutic  agents.  Finally,  in 
collaboration  with  Project  2  we  will  perform  correlation  analysis  of  these  NSCLC  serum 
phosphopeptide  profile  signatures  with  patients’  clinical  characteristics  to  predict  lung  cancer, 
cancer  progression,  cancer  stages,  and  overall  survival  rate;  to  characterize  serum 
phosphopeptide  proteomic  patterns  and  signatures  in  correlation  to  tumor  recurrence,  clinical 
response  to  adjuvant  chemotherapeutic  and  targeted  agents,  and  development  of  resistance; 
and  to  identify  serum  phosphopeptide  markers  as  surrogate  predictors  of  patient  outcome. 

Moreover,  in  Aim  3  we  will  quantify  total  circulating  plasma  DNA  and  methylation-specific  DNA 
in  all  300  patients  with  NSCLC  enrolled  in  the  Project  2  clinical  trial.  The  circulating  DNA  levels 
will  be  correlated  with  patients’  clinicopathologic  characteristics.  Any  changes  in  these  levels 
during  chemotherapy  and  after  surgery  will  be  correlated  with  patient  response  to  neoadjuvant 
therapy  and  patient  outcome  after  surgery.  The  correlation  between  circulating  methylated  DNA 
levels  and  tumor  DNA  methylation  will  also  be  examined  in  a  selected  panel  of  patients. 

Summary  of  Research  Findings 

Protein  phosphorylation  is  a  dynamic,  post-translational  modification  that  plays  a  critical  role  in 
the  regulation  of  a  wide  spectrum  of  biological  events  and  cellular  functions  including  signal 
transduction,  gene  expression,  cell  proliferation,  and  apoptosis.  We  have  developed  a  functional 
proteomics  technique  using  the  ProteinChip  array-based  SELDI-TOF-MS  analysis  for  high 
throughput  profiling  of  phosphoproteins/phosphopeptides  in  human  serum  for  the  early 
detection  and  diagnosis  as  well  as  for  the  molecular  staging  of  human  cancer.  We  have  been 
able  to  use  this  proteomics  platform  to  selectively  isolate,  profile,  and  identify  phosphopeptides 
present  in  a  highly  complex  mixture  prepared  from  human  lung  cancer  patient  serum  samples. 
We  have  identified  a  phosphopeptide  with  a  1752.3  Da  mass  as  Alpha-1 -acid  glycoprotein  1 
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precursor  (A1AG1),  a  potential  target  of  multiple  protein  tyrosine  kinases  including  EGFR,  and  a 
novel  ligand  of  nicotinic  acetylcholine  receptor  (nAChR)  protein  subunits.  We  found  that  the 
AIAG1  phosphopeptide  is  significantly  upregulated  in  cancer  serum  samples  (more  that  10-fold 
increase  in  mass  peak  intensity,  P  =  0.0024)  by  SELDI-TOF-Spectrometry  analysis.  The 
upregulated  phosphorylated  AIAG1  has  also  been  detected  by  phosphor-AIAGI -specific  ELISA 
analysis  in  the  serum  samples  of  the  early  stage  (Stage  I)  lung  cancer  patients  and  ever 
smokers  and  in  human  lung  cancer  cell  lines.  We  also  detected  the  interaction  of  AIAG1  protein 
with  nAChR-a4,  p2,  and  al  subunits  by  immuno-precipitation  and  immuno-blotting  analysis, 
suggesting  a  role  of  AIAG1  as  a  potential  ligand  of  nAChR  proteins  in  regulation  of  nAChR- 
mediated  signaling  pathway  in  lung  cancer  carcinogenesis.  Further  characterization  of  AIAG1 
expression  and  biological  activity  in  larger  population  of  lung  cancer  patient  serum  samples  and 
in  lung  cancer  cell  lines  in  vitro  and  in  vivo  will  provide  validation  of  using  the  phospho-  A1AG1 
peptide  as  a  novel  serum  biomarker  for  early  lung  cancer  detection  and  intervention. 

We  plan  to  further  validate  phosphopeptide  profiling  in  large  group  of  lung  cancer  patient  serum 
samples,  and  to  analyze  AIAG1  and  Phospho-AIAGI  in  lung  cancer  cell  lines,  serum,  and  tissue 
samples.  During  the  next  project  period,  we  will  functionally  characterize  AlAGI/aChR  subunits 
interaction  and  signaling  in  lung  cancer  cell  lines.  Investigations  into  the  modulation  of  serum 
and  cellular  phospho-AIAGI  in  response  to  tyrosine  kinase  inhibitors  (TKIs)  or  tobacco 
carcinogens  will  begin,  and  we  will  elucidate  the  role  of  AIAG1  in  the  EGFR/AKT  signaling 
pathway  in  lung  cancer  carcinogenesis,  diagnosis,  and  prognosis. 

Key  Research  Accomplishments 

•  Performed  extraction  of  DNA  and  RNA  of  over  600  NSCLC  and  53  MPM  with  annotated 
clinicopathologic  information  for  profiling  analysis. 

•  Developed  an  mRNA  prognostic  signature  for  NSCLC  using  FFPE  tissue  specimens. 

•  Performed  mRNA  and  miRNA  molecular  profiling  in  53  MPM  tumor  and  cell  line 
specimens. 

•  Collected  >200  frozen  NSCLC  tissue  specimens  from  patients  who  received  neoadjuvant 
therapy,  and  evaluated  the  pathological  response  to  chemotherapy  in  133  cases. 

•  Characterized  NSCLC  tissue  specimens  for  novel  biomarkers  associated  to  resistance 
to  chemotherapy  in  lung  cancer,  including  Nrf2/Keap1,  membrane  transporters  and  cancer 
stem  cell  markers. 

Conclusions 


During  the  second  project  period,  we  reached  our  collection  goal  of  NSCLC  tissues  from 
patients  who  received  neoadjuvant  chemotherapy,  and  finalized  the  extraction  of  DNA  and  RNA 
for  molecular  profiling  of  chemo-nai've  surgically  resected  NSCLCs.  We  have  initiated  the 
molecular  profiling  of  lung  cancer,  and  developed  an  NSCLC  prognostic  mRNA  signature  using 
FFPE  tissues.  We  are  in  the  process  of  completing  comprehensive  (mRNA,  miRNA,  DNA  and 
protein)  profiling  analyses  of  MPM  tissue  and  cell  line  specimens. 
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Project  4:  Target  Modulation  Following  Induction  Treatment  With  Dasatinib  in  Patients 
With  Malignant  Pleural  Mesothelioma  (MPM)  and  Identification  of  New  Therapeutic 
Targets/Strategies  for  MPM 

(Leaders:  Drs.  Anne  Tsao,  Reza  Mehran) 

Hypothesis: 

We  hypothesize  that  dasatinib,  a  broad  spectrum  ATP-competitive  inhibitor  for  oncogenic 
tyrosine  kinases  (BCR-ABL,  SRC,  c-Kit,  PDGFR,  and  ephrin  receptor  kinases),  may  be  a  new 
therapeutic  agent  in  malignant  pleural  mesothelioma  (MPM).  We  also  believe  that  conducting 
therapeutic  target-focused  (TTF)  molecular  and  gene  profiling  (Affymetrix  arrays)  will  lead  to 
development  of  other  novel  therapies  for  MPM. 

Specific  aims: 

Aim  1:  Conduct  a  phase  I  clinical  trial  with  the  primary  endpoint  of  biomarker  modulation 
using  dasatinib  as  induction  therapy  in  patients  with  resectable  MPM. 

la.  Determine  the  effects  of  dasatinib  induction  therapy  on  selected  tumor  biomarkers  (activated 
Src,  PDGFR,  VEGFR)  pre-  and  post-induction  therapy. 

lb.  Determine  the  modulatory  effects  of  dasatinib  on  selected  biomarkers  of  survival  and 
apoptosis  (PI3K/AKT,  bcl-xL,  caspases),  proliferation  (IGFR,  Ki-67),  angiogenesis  (IL-8, 
bFGF,  TNF-a),  epithelial-mesenchymal  transition  (TNF-p,  E-cadherin,  c-Kit/Slug)  and 
invasion/migration  (Ephrin,  MMP)  in  tumor  specimens  pre-  and  post-  induction  therapy. 

lc.  Determine  the  effects  of  induction  dasatinib  therapy  on  tumor  mean  vessel  density,  cell 
apoptosis,  and  the  proliferation  index. 

ld.  Determine  the  modulatory  effects  of  dasatinib  on  serum,  platelet,  and  pleural  effusion 
markers  of  survival  (PI3K/AKT,  bcl-xL,  caspases),  proliferation  (IGFR,  Src),  angiogenesis 
(soluble  VEGFR,  VEGF,  PDGF,  IL-8,  bFGF,  TNF-a),  and  invasion/migration  (Ephrin,  MMP). 

le.  Determine  the  drug  concentration  of  dasatinib  in  tumor  and  serum. 

lf.  Assess  the  effects  of  dasatinib  and  cytoreductive  surgery  on  the  serum  mesothelin-related 
peptide  (SMRP)  level. 

lg.  Assess  the  safety  and  toxicity  profile  of  induction  dasatinib  in  patients  with  resectable  MPM. 

Aim  2:  Conduct  radiographic  correlates  of  tumor  response  and  clinical  outcome  with 
positron-emission  technology-computer  tomography  (PET-CT). 

Aim  3:  Explore  and  develop  new  therapeutic  targets  and  treatment  strategies  for  MPM  in 
tumor  specimens  collected  from  Specific  Aiml  and  in  MPM  cell  lines. 

3a.  Determine  key  signaling  pathways  involved  in  tumor  resistance  or  sensitivity  to  dasatinib 
using  therapeutic  target-focused  (TTF)  molecular  and  global  gene  expression  profiling  on 
MPM  tumor  specimens  pre-  and  post-  induction  dasatinib  therapy. 

3b.  Determine  the  sensitivity  of  a  panel  of  MPM  cell  lines  to  targeted  agents  tested  in  Project  1 
via  TTF  profiling  and  DATs  (drug  and  therapeutic  target  siRNA). 

Summary  of  Research  Findings 

It  should  be  noted  that  Dr.  Reza  Mehran  has  replaced  Dr.  David  Rice  as  Co-Leader  of  this 
project,  due  to  Dr.  Rice’s  increased  responsibilities  with  other  programmatic  activities.  As  shown 
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in  his  appended  biosketch,  Dr.  Mehran  brings  long-standing  expertise  in  the  management  of 
thoracic  malignancies,  and  leadership  of  large,  complex  clinical  studies. 

We  designed  a  biomarker-based  neoadjuvant  trial  from  our  preclinical  studies  during  the 
previous  project  period.  The  trial  is  intended  to  show  that  dasatinib,  a  multi-targeted  Src  kinase 
inhibitor,  has  activity  against  MPM  and  target-specificity  to  Src  Tyr419.  Untreated  MPM  patients 
underwent  extended  surgical  staging  (ESS)  with  multiple  biopsies  along  the  future  surgical 
incision  line  to  account  for  tumor  heterogeneity  and  evaluate  for  sarcomatoid  features.  If 
deemed  a  surgical  candidate  for  either  pleurectomy/decortication  (P/D)  or  extrapleural 
pneumonectomy  (EPP),  patients  received  4  weeks  of  oral  dasatinib  (70  mg  BID)  followed  by 
P/D  or  EPP.  If  either  a  radiographic  or  molecular  response  (de-phosphorylation  of  Src  Tyr419  in 
tumor)  was  observed,  an  additional  2  years  of  dasatinib  maintenance  after  adjuvant 
radiotherapy  and  systemic  chemotherapy  was  scheduled  for  the  affected  patients. 
Serum/blood/platelets  and  pleural  effusion  were  collected  for  exploratory  analysis  of  peripheral 
surrogate  biomarkers.  The  primary  endpoint  of  this  trial  is  biomarker  modulation  of  Src  Tyr419; 
secondary  endpoints  include  response,  survival,  safety/toxicity,  and  biomarker  modulation  in 
tumor/serum/platelets/pleural  effusion. 

Fourteen  patients  have  been  accrued  to  this  trial  from  April  2008  to  April  2009;  ten  have 
successfully  completed  the  ESS,  neoadjuvant  dasatinib,  and  P/D  (n=6)  or  EPP  (n=4).  Two 
patients  are  currently  receiving  neoadjuvant  dasatinib,  2  patients  were  deemed  to  not  be 
surgical  candidates  due  to  a  rapid  decline  in  PS,  and  one  patient  was  found  to  have  bilateral 
mesothelioma.  The  main  side  effects  recorded  for  dasatinib  were  grade  1-2  anemia,  nausea, 
vomiting,  anorexia,  electrolyte  abnormalities,  fatigue,  and  anxiety.  Grade  3  toxicities  included 
hyperkalemia  (1),  infection  -  pneumonia  (1),  and  hypoxia  (1).  There  were  no  grade  4-5  toxicities 
recorded  for  these  patients.  Post-surgical  grade  3  toxicity  included  anemia,  electrolyte 
abnormalities,  arrhythmia,  HTN,  and  pleural  effusion;  one  grade  4  episode  of  hyperglycemia 
was  seen.  After  4  weeks  of  neoadjuvant  dasatinib  therapy,  there  was  one  non-evaluable 
patient,  one  recorded  PD,  eight  SD,  and  two  minor  responses.  In  the  two  patients  with  a 
radiographic  response  by  PET-CT,  their  anatomic  response  correlated  with  a  molecular 
response,  with  dephosphorylation  of  SrcTyr419  observed  in  their  tumor  tissue.  Based  on  these 
clinical  results,  we  found  that  conducting  biomarker-based  clinical  trials  with  novel  agents  in 
MPM  is  feasible  and  necessary  to  further  our  understanding  of  this  deadly  disease.  There  is 
preliminary  evidence  that  a  subgroup  of  MPM  patients  may  gain  clinical  benefit  from  dasatinib 
therapy,  and  that  modulation  of  p-Src  Tyr419  in  MPM  tumor  tissue  is  a  reasonable 
pharmacodynamic  marker  for  dasatinib  treatment.  Future  translational  studies  will  correlate  the 
outcome  and  tumor  p-Src  Tyr419  with  peripheral  surrogate  markers  for  response  and  evaluate 
potential  pathways  of  resistance  to  dasatinib  therapy  in  tumor  tissue.  The  optimal  multi-modality 
treatment  for  resectable  malignant  pleural  mesothelioma  (MPM)  still  remains  unknown.  No  prior 
neoadjuvant  trials  with  targeted  agents  have  been  published  due  to  limited  funding  and  eligible 
patients. 

In  other  related  efforts,  in  collaboration  with  this  project  and  Project  3,  the  Pathology  Core  has 
constructed  a  MPM  tissue  microarray  containing  76  surgically  resected  tumor  cases,  including 
epitheloid,  sarcomatoid,  and  biphasic  histology  types,  with  well-annotated  clinicopathologic 
information.  This  TMA  has  been  used  to  characterize  the  expression  of  several  IHC  markers 
(please  see  Pathology  Core  report  for  further  detail).  The  Pathology  Core  has  also  collected, 
banked,  and  characterized  MPM  tumor  tissue  from  10  patients  enrolled  in  the  MPM  dasatinib 
clinical  trial  who  underwent  video-assisted  thoracoscopy  (VAT)  and  extrapeural 
pneumonectomy  (EPP).  A  total  of  172  (91  baseline  [VAT]  and  81  at  surgery  [EPP])  fresh  frozen 
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and  formalin-fixed  tumor  tissue  specimens  have  been  obtained,  processed,  and  characterized 
by  the  Pathology  Core. 

Key  Research  Accomplishments 

•  Demonstrated  that  the  Src  Tyr419  biomarker  is  accurately  predicting  for  radiographic 
response  in  patients  receiving  dasatinib  therapy. 

•  Enrolled  14  patients  on  the  clinical  trial. 

•  Contributed  specimens  used  in  construction  of  an  MPM  tissue  microarray  and  172  MPM 
tumor  tissue  specimens  from  the  clinical  trial  to  the  MPM  tissue  bank. 

Conclusions 


We  have  demonstrated  that  this  novel  trial  design  is  feasible  and  preliminary  evidence  suggests 
that  our  Src  Tyr419  biomarker  is  accurately  predicting  for  radiographic  response  in  patients 
receiving  dasatinib  therapy.  There  is  a  subpopulation  of  MPM  patients  that  may  derive  clinical 
benefit  from  oral  dasatinib  therapy.  MPM  is  a  very  heterogeneic  tumor,  and  molecular  profiling 
will  be  necessary  in  future  studies  to  ultimately  optimize  targeted  therapy  in  this  disease. 

Preliminary  evidence  suggests  that  modulation  of  p=Src  Ty419  is  a  feasible,  reasonable 
pharmacodynamic  biomarker  for  dasatinib.  Future  plans  include  correlating  outcome  and  tumor 
p-Src  tyr419  to  peripheral  surrogate  markers  in  blood/serum/platelets  and  pleural  effusion,  and  to 
analyze  pathways  of  resistance  in  MPM  tumors. 


Project  5:  Development  of  a  Novel  Multi-Biomarker  System  Using  Quantum  Dot 
Technology  for  Assessments  of  Prognosis  of  NSCLC  and  Prediction  of  Outcome  of 
EGFR-Targeted  Therapy 

(Leader:  Dr.  Zhuo  (Georgia)  Chen;  Co-Leaders:  Drs.  Fadlo  Khuri,  Dong  Shin,  Ruth  O’Regan, 
Shi-Yong  Sun) 

Quantum  dots  (QDs)  provide  sharper  fluorescent  signals  than  organic  dyes  and  can  detect 
multi-biomarkers  simultaneously  in  the  same  material,  allowing  quantification  and  correlation  of 
molecular  signature  with  cellular  response  to  targeted  therapies. 

Hypothesis: 

A  multi-biomarker  system  using  quantum  dot  (QD)  technology  will  enhance  accuracy  in 
assessment  of  prognosis  of  non-small  cell  lung  cancer  (NSCLC)  and  prediction  of  outcome  of 
epidermal  growth  factor  receptor  (EGFR)-targeted  therapy. 

Specific  Aims: 

Specific  Aim  1:  Development  of  QD-Abs  and  imaging  systems  for  detection  and 
quantification  of  multi-biomarkers  (MBM)  using  lung  cancer  cell  lines. 

Summary  of  Research  Findings 

Specific  Aim  1  was  completed  in  2008.  Major  findings  were  reported  last  year  and  were 
published  in  Nanotechnology.  Our  results  illustrated  that  QD-immunocytochemistry  (ICC)- 
based  technology  can  not  only  quantify  basal  level  of  multiplex  biomarkers  but  can  also  track 
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the  localization  of  biomarkers  upon  biostimulus.  With  this  new  technology,  we  found  that  EGFR 
and  E-cad  were  located  mainly  in  the  cytoplasm  in  EGFR-TKI-insensitive  cells;  however,  in 
EGFR-TKI-sensitive  cells,  they  were  found  mainly  on  the  cell  membrane.  After  induction  with 
EGF,  both  EGFR  and  E-cad  internalized  to  the  cytoplasm,  but  the  internalization  capability  in 
EGFR-TKI-sensitive  cells  was  greater  than  that  in  EGFR-TKI-insensitive  cells.  The 
quantification  also  showed  that  the  inhibition  of  EGF-induced  EGFR  and  E-cad  internalization  by 
erlotinib  in  the  sensitive  cells  was  stronger  than  that  measured  in  the  insensitive  cells.  These 
studies  demonstrate  that  there  are  substantial  differences  between  EGFR-TKI-insensitive  and 
EGFR-TKI-sensitive  cancer  cells  in  EGFR  and  E-cad  expression  and  localization,  both  at  the 
basal  level  and  in  response  to  EGF  and  erlotinib.  QD-based  analysis  facilitates  the 
understanding  of  the  features  of  EGFR-TKI-insensitive  vs.  EGFR-TKI-sensitive  cancer  cells  and 
may  ultimately  be  useful  for  the  prediction  of  patients’  response  to  EGFR-targeted  therapy. 

Specific  Aim  2:  Verification  of  QD-Abs  for  detection  and  quantification  of  MBM  by 
comparison  with  conventional  IHC  using  paraffin-embedded  tissues  and  evaluation  of 
their  prognostic  value  in  NSCLC. 

Summary  of  Research  Findings 


Nanoparticle  QDs  are  ideal  materials  for  multiplexed  biomarker  detection,  localization,  and 
quantification;  however,  working  conditions  for  the  application  of  QD  in  staining  of  formalin-fixed 
and  paraffin-embedded  (FFPE)  specimens  need  to  be  optimized.  Both  direct  and  indirect 
methods  are  available  for  QD-based  immunohistofluorescence  (QD-IHF)  staining,  but  the  direct 
method  has  been  considered  laborious  and  costly.  In  this  study,  we  optimized  and  compared 
the  indirect  QD-IHF  single-staining  procedure  using  QD-secondary  antibody  conjugates  and 
QD-streptavidin  conjugates.  Problems  associated  with  sequential  multiplex  staining  were 
identified  quantitatively.  A  method  using  a  QD  cocktail  solution  was  developed  allowing 
simultaneous  staining  with  three  antibodies  against  E-cadherin,  EGFR,  and  3-catenin  in  FFPE 
tissues.  The  expression  of  each  biomarker  was  quantified  and  compared  using  the  cocktail  and 
the  sequential  method.  Our  results  demonstrated  that  the  QD  signal  for  each  multiplexed 
biomarker  was  more  consistent  and  stable  using  the  cocktail  method  than  the  sequential 
method,  providing  a  unique  tool  for  potential  research  and  clinical  applications  (Figure  1).  A 
quantification  method  for  multiplexing  three  biomarkers  (EGFR,  E-cadherin,  and  p-catenin  plus 
DAPI  in  FFPE  tissues)  was  developed  using  the  CRi  Nuance  spectral  system  (Figure  2). 


We  also  validated  the 
biomarker  detection 
using  conventional 
immunohistochemis 
try  (IHC)  with  QD- 
based 

immunohistofluores 
cence  (IHF);  and  2) 
the  comparison  of 
biomarker  signals 
from  samples 

stained  with  single 
QD  IHF  in  serial 
sections  to 

biomarker  signals 
from  the  same 


QD-IHF  procedures.  The  validation  included:  1)  the  comparison  of  single 
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Fig.  1  Multftilexed  QD-IHF  stalnfog  or  FFPE  tissues  with  the  cocktail  method.  Signals  for 
EGFR,  E-cadherin,  and  p-catenin  plus  membrane  and  nucleus  markers  are  illustrated  by 
pseudocolors. 
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proteins  but  from  samples  stained  simultaneously  with  multiple  QD  IHF.  FFPE  tissue  sections 
from  30  FFPE  tissue  samples  were  used  for  the  validation.  Both  Pearson’s  and  Spearman’s 
tests  show  significant  correlation  between  IHC  and  QD-IHF  for  the  single-marker  staining  tests 
(EGFR:  correlation  coefficient  R  =  0.8-0. 9,  p<  0.00001;  E-cadherin:  R  =  0.9,  p<0. 00001;  p- 
catenin:  R  =  0.7-0. 8,  p<0. 00001)  and  for  the  singleplex  versus  multiplex  tests  (EGFR:  R  =  0.8- 
0.9,  p<0. 00001;  E-cadherin:  R  =  0.8,  p<0. 00001;  p-catenin:  R=  0.7-0. 8,  p<0. 00001)  (Figure  3). 
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EGFR  Correlationship  between  IHC  and  QD-IHF 


B  QD-IHF  Single  and  Cocktail  Method  Correlationship  of  EGFR 
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Fig.  3  Validation  of  EGFR  expression  detected  by  different  staining  methods  as  an  example. 

Comparison  of  biomarker  signals  from  IHC  with  those  from  QD-IHF  (p  <  0.0001)  (A)  and  between 
the  single  QD-IHF  staining  and  the  simultaneous  staining  with  multiple  QD-  IHF  (p  <  0.0001)  (B). 


To  complete  this  specific  aim,  tissue  samples  (including  tumor  specimens  and  adjacent  normal 
tissues)  from  94  cases  of  NSCLC  with  relevant  clinical  information  were  collected  for  IHC  and 
QD-IHF  staining  (Figure  4).  Quantification  of  QD-IHF  is  still  ongoing.  For  quantification  of  IHC 
results,  Weighted  Index  {Wl  =  [percentage  of  positive  stain  x  intensity  score  (0,  1+,  2+,  and  3+)] 
x100}  and  ratio  of  membrane  to  total  staining  (RMT  =  signal  of  membrane  stain/signal  of  total 
stain)  were  recorded.  Preliminary  statistical  analysis  of  IHC  showed  that  both  expression  and 
membrane  localization  of  all  three  biomarkers  in  tumor  tissue  are  significantly  different  from 
those  in  the  adjacent  normal  tissue  (Table  1).  Development  and  validation  of  QD-IHF  for  the 
second  set  of  biomarkers  relevant  to  mTor  pathway  is  ongoing. 


Table  1:  Summary  of  the  Semi-quantification  of  IHC 


EGFR 

(Wl) 

% 

Membrane 

E-cadherin 

(Wl) 

% 

Membrane 

p-catenin 

(Wl) 

% 

Membrane 

Normal 

36.0 

40.7 

179.2 

89.7 

158.4 

82.7 

Tumor 

97.4 

31.1 

136 

50 

130.2 

50.4 

p-Value* 

1.34E-16 

2.12E-04 

2.44E-1 1 

1.28E-28 

7.47E-09 

1 .49E-29 

*  Paired  t-test 
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Fig  4.  Representative  IHC  staining  of  EGFR,  E-cadherin,  and  (3-catenin  in  both  adjacent  normal  and  tumor 
tissues.  Expression  of  EGFR  is  significantly  higher  in  tumor  tissues  than  that  in  the  adjacent  normal  tissue,  while 
expressions  of  E-cadherin  and  (3-catenin  are  less  and  more  internalized  in  the  tumor  tissues  than  those  in  the 
adjacent  normal  epithelia.  We  expect  that  simultaneously  characterizing  these  features  in  the  same  tissue  using 
QD-IHF  will  correlate  more  precisely  the  biology  with  progression  of  these  tumor  cells.  (Magnification  400X) 


Specific  Aim  3:  Correlation  of  the  MBM  detected  by  QD-Abs  with  outcomes  of 
chemotherapies  and  EGFR-  targeted  therapy  using  resectable  NSCLC  tissues. 

Summary  of  Research  Findings 

This  study  was  proposed  for  years  3  and  4  of  this  grant;  thus,  are  no  updates  for  this  Specific 
Aim. 

Key  Research  Accomplishments 

•  Optimized  and  validated  QD-staining  conditions  for  multiplexing  three  biomarkers  (EGFR,  E- 
cadherin,  and  (3-catenin)  in  both  cell  lines  and  FFPE  tissues. 

•  Developed  a  quantification  method  for  QD  signals  using  the  CRi  Nuance  spectral  system. 

•  Collected  training  set  materials  including  94  cases  of  NSCLC  and  their  adjacent  normal 
tissues,  and  entered  clinical  information  into  a  database  for  further  analysis. 

•  Completed  staining  of  the  three  biomarkers  in  the  94  pairs  of  the  NSCLC  tissues  by  both 
IHC  and  QD-IHF  methods.  The  imaging  and  statistical  analyses  are  ongoing. 
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Conclusions 


In  the  past  year,  we  completed  the  proposed  cell  line  studies  in  Specific  Aim  1  and  published 
the  results  in  Nanotechnology.  Our  findings  provide  new  biomarkers  and  QD  methodology  in 
predicting  sensitivity  to  EGFR-targeting  therapy  which  can  be  applied  to  tumor  tissue  specimens 
for  clinical  application.  Furthermore,  clarifying  substantial  differences  between  EGFR-TKI 
sensitive  and  insensitive  cancer  cells  will  help  to  understand  the  mechanism  of  EGFR-targeted 
resistance  and  facilitate  the  development  of  new  targeted  therapies.  During  the  project  period, 
we  focused  on  optimization  and  validation  of  a  quantification  strategy  for  using  QD-based  IHF. 
These  studies  provided  a  solid  foundation  for  analyzing  biomarker  expressions  in  NSCLC 
tissues.  Using  this  strategy,  we  have  completed  the  immunostaining  of  three  biomarkers  - 
EGFR,  E-cadherin,  and  p-catenin  -  in  94  pairs  of  the  patients’  tissue  samples.  Further  imaging 
and  statistical  analyses  of  these  stains  will  answer  and  important  question  of  whether 
quantification  of  multiplex  biomarkers  by  QD-IFIF  can  provide  more  accurate  correlation  to 
patient’s  prognosis  and  the  other  relevant  clinical  information  than  a  signal  biomarker  analysis. 


Pathology  Core 

(Director:  Dr.  Ignacio  Wistuba) 

The  Pathology  Core  is  an  essential  component  of  the  PROSPECT  program.  The  Pathology 
Core  plays  an  important  role  by  collecting,  processing  and  distributing  tissue  and  serum 
specimens  obtained  from  Clinical  Trials  on  NSCLC  (Project  2)  and  malignant  pleural 
mesothelioma  (MPM;  Project  4)  for  molecular  profiles  and  biomarker  analysis.  Our  objectives 
(functions)  are  as  follows: 

1 .  Develop  and  maintain  a  repository  of  tissue  and  serum  specimens  from  patients  with  non¬ 
small  cell  lung  carcinoma  (NSCLC)  and  malignant  pleural  mesothelioma  (MPM). 

2.  Process  NSCLC  cell  lines  and  tissue  specimens  for  histopathologic  and  molecular  analyses. 

3.  Perform  and  evaluate  immunohistochemical  (IHC)  analysis  in  human  tumor  tissue 
specimens  and  mouse  xenograft  tissues. 

Objective  1.  Develop  and  maintain  repository  of  tissue  and  serum  specimens  from 
patients  with  lung  cancer  and  malignant  pleural  mesothelioma  (MPM). 

Summary  of  Research  Findings 

Selection  of  lung  cancer  and  mesothelioma  specimens  available  in  Thoracic  Malignancy  Tissue 
Bank.  As  reported  last  year,  we  identified  1,385  non-small  cell  lung  cancer  (NSCLC)  tumor 
specimens  as  potential  cases  for  PROSPECT  Projects  2  and  3,  including  the  major  histology 
types  adenocarcinoma  (n=729)  and  squamous  cell  carcinoma  (n=414).  Of  those  specimens,  we 
stored  frozen  tumor  tissue  available  from  patients  who  have  consented  to  their  tissue  to  be 
banked  and  used  for  research  purposes.  We  selected  736  NSCLC  cases  for  Project  3  and  4; 
147  of  the  736  (20%)  NSCLC  cases  have  received  neoadjuvant  chemotherapy.  In  addition, 
more  than  4,011  NSCLC  cases  with  formalin-fixed  and  paraffin-embedded  (FFPE)  tissues 
available  have  been  identified  and  specimens  banked.  All  the  cases  retrieved  are  under 
histopathological  review  and  classified  according  to  the  2004  WHO  Pathology  Classification  for 
lung  cancer.  Peripheral  mononuclear  blood  cells  (PMBC)  and  serum  samples  collected  during 
the  surgical  resection  from  464  NSCLC  and  36  MPM  patients  are  also  available.  As  reported 
last  year,  we  have  identified  108  MPMs  with  frozen  tumor  tissue  available  for  PROSPECT 
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Project  4.  FFPE  tissues  from  91  MPMs  have  been  collected,  and  clinical  and  pathological 
information  have  been  obtained  in  all  cases. 

Prospective  collection  and  banking  of  lung  cancer  and  mesothelioma  specimens  for 
PROSPECT  projects.  Since  the  activation  of  the  PROSPECT  laboratory  project  on  August 
2007,  the  Pathology  Core  has  collected  fresh  and  formalin-fixed  tissue  specimens  from  272 
NSCLC  and  19  MPM  surgically  resected  cases  (Table  1).  During  this  period  of  time,  347 
surgeries  for  lung  cancer  and  mesothelioma  have  been  performed,  and  we  have  collected 
tissue  specimens  in  84%  of  them.  From  those,  snap-frozen  normal  and  tumor  tissue  have  been 
collected  in  all  cases.  In  addition,  we  have  obtained  and  banked  tumor  specimens  in  RNAIater® 
(Ambion,  Austin,  TX)  (n=1 1 5  samples),  12%  dimethyl  sulfoxide  (DMSO)-preserved  samples 
(n=122  samples),  and  OCT-embedded  for  frozen  sectioning  (n=96  samples).  From  the  272 
NSCLC  cases,  79  patients  have  received  neoadjuvant  chemotherapy.  Blood  specimens  (serum 
and  PMBC)  have  collected,  processed,  and  banked  in  283  out  of  347  surgeries  (69%).  Of 
interest,  tissue  and  blood  specimens  have  been  obtained  in  231  cases  (67%). 

Table  1 .  Summary  of  prospectively  collected  tumor  tissue  specimens  from  NSCLC  and  MPM  cases. 


Histology 

Number  of  Cases 

Adenocarcinoma 

160 

Squamous  cell  carcinoma 

63 

Large  cell  carcinoma 

4 

Other  NSCLC 

41 

No  tumor  present 

4 

Total  Lung  Tumors 

272 

Malignant  Mesothelioma 

19 

Thoracic  Malignancy  Tissue  Bank  Database.  To  improve  the  handling  of  tissue  and  blood 
specimens,  our  institution  developed  a  Web-based  tissue  banking  system,  named 
TissueStation.  We  also  have  a  Microstrategy  Report  Services  system  (Oracle-based),  which 
allows  us  to  run  reports  and  retrieve  data  from  the  TissueStation  database.  Both  applications 
(TissueStation  and  Microstrategy)  have  been  instrumental  to  the  success  of  the  Pathology  Core 
in  obtaining  and  banking  tissue  and  blood  specimens  from  NSCLC  and  MPM  patients. 

Objective  2.  Process  NSCLC  cell  lines  and  tissue  specimens  for  histopathological  and 
molecular  analyses. 


Summary  of  Research  Findings 

Cell  Lines,  a)  Lung  Cancer.  In  collaboration  with  Project  1  (Drs.  J.  Heymach  and  J.  Minna),  we 
have  developed  a  repository  of  48  NSCLC  cell  lines  and  2  normal  bronchial  epithelial  cells;  b) 
Mesothelioma.  In  collaboration  with  Projects  3  and  4  (Drs.  I.  Wistuba  and  A.  Tsao, 
respectively),  we  have  acquired  17  mesothelial  and  mesothelioma  cell  lines  (Table  2)  with  a 
good  distribution  of  different  histotypes  including  4  epitheloid,  2  biphasic  and  2  sarcomatoid 
MPMs.  Some  of  these  are  currently  being  characterized  by  immunohistochemistry  (IHC)  using  7 
different  markers  to  distinguish  them  as  authentic  mesothelioma  cell  lines.  Three  of  these  IHC 
markers,  Cytoketarin  5/6,  calretinin  and  mesothelin,  are  seen  frequently  in  MPM  tumors, 
whereas  CEA  (carcinoembryonic  antigen),  B72.3,  CD15  (LeuMI)  and  TTF-1  (thyroid 
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transcription  factor-1)  are  rarely  seen  in  these  tumor  cells  and,  therefore,  represent  negative 
controls.  For  all  these  cell  lines,  several  frozen  vials  have  been  obtained  and  stored  for  future 
work.  In  addition,  RNA,  DNA,  and  protein  have  been  extracted  and  stored.  Importantly,  FFPE 
pellets  have  been  prepared  in  all  cell  lines  to  be  used  as  control  for  IHC  and  fluorescent  in  situ 
hybridization  (FISH)  experiments.  Both  NSCLC  and  MPM  cell  lines  are  being  STR  DNA 
fingerprinted  in  the  DNA  core  for  their  authentication  in  the  M.  D.  Anderson  Molecular 
Cytogenetics  Core  facility. 


Table  2.  List  ol 

MPM  and  normal  mesothelial  cells  stored  in  the  Pathology  Core 

Cell  line 

Type 

Source 

HMeso 

MPM 

Dr.  Harvey  Pass 

HP-3 

MPM 

Dr.  Harvey  Pass 

HP-4 

MPM 

Dr.  Harvey  Pass 

HP-5 

MPM 

Dr.  Harvey  Pass 

HP-6 

MPM 

Dr.  Harvey  Pass 

HP-7 

MPM 

Dr.  Harvey  Pass 

HP-9 

MPM 

Dr.  Harvey  Pass 

HP-10 

MPM 

Dr.  Harvey  Pass 

HCT-4012 

Pleural  Mesothelial  (Telomerase-  transformed) 

Dr.  Adi  Gazdar 

Met-5A 

Pleural  Mesothelial  (SV40-  transformed) 

ATCC 

MSTO- 

211H 

MPM,  Biphasic 

ATCC 

H28 

MPM,  Epitheloid 

ATCC 

H2052 

MPM,  Epitheloid 

ATCC 

H2452 

MPM,  Epitheloid 

ATCC 

JL-1 

MPM,  Epitheloid 

DSMZ 

DM-3 

MPM,  Sarcomatoid 

DSMZ 

RS-5 

MPM,  Sarcomatoid 

DSMZ 

Figure  1.  Schematic  diagram  of  the  shaving 
method.  In  this  method,  tissue  processing  for 
histology  quality  control  and  nucleic  acids  and 
protein  extraction  are  combined. 


Thirty  26fim-thlck  section 
One  S|im-thick  section 
Thirty  26|im-thlck  section 
One  5)im-thick  section 
Thirty  26um-thick  section 


Tissue  Processing  for  RNA,  DNA  and  Protein 

Extractions.  In  collaboration  with  Project  3  (Drs.  I. 
Wistuba,  A.  Corvalan  and  S.  Suraokar),  frozen 
tumor  and  normal  tissue  from  613  NSCLCs  and 
53  MPMs  obtained  from  the  Thoracic  Tissue 
Bank  (see  Objective  1)  have  been  processed  for 
the  extraction  of  nucleic  acids  and  proteins.  For 
extractions,  a  detailed  histopathological  analysis 
was  performed  using  a  technique  developed  in- 
house  called  the  “shaving  method”  (Figure  1). 
This  technique  uses  5-pm-thick  haematoxylin- 
eosin  (H&E)-stained  histology  sections  obtained 
at  four  levels  of  the  tissue  specimen  that  are 
alternated  by  two  sets  of  thirty  20-pm  thick 
sections  obtained  for  DNA,  RNA,  and  protein 
extractions.  All  the  shaving  processing  has  been 
performed  using  RNase-free  conditions.  The 
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microtome  and  blades  were  routinely  cleaned  with  ethanol  (70%)  to  avoid  any  risk  of 
degradation  of  RNA  by  RNase.  After  cutting,  all  samples  have  been  stored  in  an  -30°C  freezer 
until  the  extractions  are  ready  to  be  performed. 


Histology  Quality  Control  of  Tissue  Specimens. 

For  detailed  histopathological  analysis,  each 
tumor  H&E-stained  section  was  examined  by 
an  experienced  lung  cancer  pathologist  to 
assess  the  percentage  of  tumor  versus 
adjacent  normal  tissues,  the  percentage  of 
malignant  cells  versus  tumor  non-malignant 
stromal  (inflammatory,  vascular  and  fibroblasts) 
cells,  and  normal  cells  present  in  the  adjacent 
normal  tissue.  In  addition,  tumor  cell  viability 
has  been  addressed  by  examining  the 
presence  of  necrosis  and  hemorrhage  in  the 
tissues.  For  NSCLC,  a  detailed 
histopathological  analysis  was  performed  on  1 ,543  slides,  797  of  which  were  tumor  slides,  and 
661  corresponded  to  unique  tumors.  These  661  cases  represent  90%  of  the  736  NSCLC 
available  in  UT-Lung  SPORE  Tissue  Bank.  Paired  normal  and  tumor  samples  were  found  in  634 
(96%)  of  cases.  Among  these  661  tumor  cases,  353  contain  >70%  tumor  content  and  >50% 
tumor  cell  content.  In  addition,  we  are  in  the  process  of  digitalization  of  all  slides  for  future 
comparisons  of  detailed  histopathological  analyses.  For  MPM,  the  histology  quality  control  was 
performed  in  159  slides  of  tumor  tissue  and  108  of  corresponding  normal  tissue  (Figure  2).  All 
these  H&E-stained  sections  are  being  scanned  and  digital  images  stored  in  an  Aperio  slide 
scanner  (Aperio  Technology)  for  future  analysis. 

MPM  Tissue  Microarrav.  In  collaboration  with  Projects  3  and  4,  we  have  constructed  a  MPM 
tissue  microarray  containing  76  surgically  resected  tumor  cases,  including  epitheloid, 
sarcomatoid,  and  biphasic  histology  types,  with  well-annotated  clinicopathologic  information. 
This  TMA  has  been  used  to  characterize  the  expression  of  several  IHC  markers  (see  Objective 
3). 

MPM  Clinical  Trial  Tissue  Collection  and  Processing.  In  collaboration  with  Project  4,  the 
Pathology  Core  has  collected,  banked,  and  characterized  MPM  tumor  tissue  from  10  patients 
enrolled  in  the  MPM  dasatinib  clinical  trial  who  underwent  video-assisted  thoracoscopy  (VAT) 
and  extrapeural  pneumonectomy  (EPP).  A  total  of  172  (91  baseline  [VAT]  and  81  at  surgery 
[EPP])  fresh  frozen  and  formalin-fixed  tumor  tissue  specimens  have  been  obtained,  processed, 
and  characterized  by  the  Pathology  Core. 

Objective  3.  Perform  and  evaluate  immunohistochemical  (IHC)  analysis  in  human  tumor 
tissue  specimens  and  mouse  xenograft  tumor  specimens. 

Summary  of  Research  Findings 

The  Pathology  Core  has  assisted  and  performed  IHC  analysis  for  a  number  of  markers  using 
TMAs  and  whole  sections  in  tumor  tissue  specimens  of  NSCLC  and  MPM  in  collaboration  with 
Projects  2  (Dr.  D.  Stewart),  3  (Dr.  I.  Wistuba),  and  4  (Dr.  A.  Tsao). 

Project  2.  Using  IHC,  18  proteins  associated  with  senescence,  proliferation,  apoptosis  and  other 
tumor-related  phenomena  (Table  3)  have  been  examined  in  a  set  of  NSCLC  TMAs  containing 


Figure  2.  Microphotographs  showing  a 
representative  example  of  MPM  frozen  tissue 
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330  tumors,  including  220  adenocarcinomas  and  110  squamous  cell  carcinomas.  Annotated 
clinicopathologic  information,  including  overall  and  recurrence-free  survival  with  a  median 
follow-up  of  7.2  years,  is  available  in  all  these  cases.  The  IHC  data  obtained  are  being  analyzed 
by  the  Biostatistics  and  Bioinformatics  Core  (Dr.  K.  Coombes)  (Figure  3).  From  the  preliminary 
analysis,  at  least  3  groups  of  patients  have  been  identified  by  the  expression  of  the  IHC  markers 
examined.  The  clinicopathologic  characteristics  of  these  cases,  including  their  outcome  is 
currently  under  evaluation. 


Table  3.  List  of  IHC  markers  examined  in  NSCLC  TMAs  in  collaboration  with  Project  2. 


p53 

CTR1 

ERCC1 

SHARP2 

p21 

RB 

SURVIVIN 

DcR2 

Ki67 

pi  6  INK4a 

HIFIa 

TUNEL 

COX2 

p14  ARF 

CA  IX 

DNMT1 

TGFp 

VEGF 

In  addition,  in  collaboration  with  Project  2,  we  have  examined  IHC  expression  of  cell  membrane 

transporters,  including  copper  transporter 
receptor  1(CTR1),  glucose  4  (GLUT4)  and 
RHOA,  and  folate  receptor  alpha  (FOLR1 )  and 
reduced  folate  carrier  1  (RFC1)  in  TMAs  from 
NSCLC  and  MPM.  The  data  on  NSCLC  have 
been  presented  in  the  2009  AACR  Meeting 
(Poster,  April  2009,  Denver,  CO)  and  will  be 
presented  in  the  IASLC  World  Lung  Cancer 
meeting  (oral  presentation,  July  2009,  San 
Francisco,  CA).  A  manuscript  is  in  preparation. 


Project  3.  In  collaboration  with  this  Project,  a 
series  of  cancer  stem  cell  (CSC)  markers  have 
been  examined  by  IHC  using  NSCLC  TMAs 
(n=330  cases),  including  EZH2,  SOX2,  CD24, 
CD44,  C-kit,  BMI-1 ,  HEY1,  HEY2,  and  Oct3/4. 
These  data  will  be  presented  as  an  oral 
presentation  in  the  IASLC  World  Lung  Cancer 
meeting  (oral  presentation,  July  2009,  San 
Francisco,  CA),  and  a  manuscript  is  in  preparation.  In  addition,  the  Pathology  Core  has 
contributed  to  the  analysis  of  Keap1/Nrf2  proteins  and  genes  expression  in  NSCLC  TMAs  and 
frozen  tissue  specimens. 

Project  4.  The  MPM  TMAs  have  been  utilized  to  characterize  the  expression  of  several  markers, 
including  markers  related  to  epithelial-to-mesenchymal  transition  (EMT;  5  IHC  markers), 
angiogenesis  (PFGFRp;  2  IHC  markers  and  FISH  for  the  gene),  and  cell  membrane 
transporters  (5  IHC  markers).  In  addition,  the  recently  acquired  MPM  cell  lines  from  Dr.  Harvey 
Pass  are  currently  being  characterized  by  IHC  using  7  different  markers  to  distinguish  them  as 
authenticate  mesothelioma  cell  lines,  including  cytoketarin  5/6,  calretinin,  mesothelin,  CEA, 
B72.3,  CD15,  and  TTF-1.  Finally,  the  Pathology  Core  has  optimized  and  examined  by  IHC  the 
expression  of  total  Src  and  p-Src  (Tyr  416),  as  well  as  Ki67  in  nearly  100  MPM  tissue  samples 
obtained  from  patients  enrolled  in  the  dasatinib  clinical  trial. 
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Key  Research  Accomplishments 

•  Collected  prospective  frozen  tissue  specimens  from  272  NSCLC  and  19  MPM  cases, 
including  79  NSCLC  cases  treated  with  neo-adjuvant  chemotherapy. 

•  Established  a  NSCLC  and  MPM  cell  line  repository  at  the  M.  D.  Anderson  Cancer  Center  in 
collaboration  with  Projects  1  and  4. 

•  Performed  extraction  with  detailed  histology  quality  control  of  613  NSCLCs  and  53  MPMs 
tumor  and  corresponding  normal  tissues,  which  will  be  used  for  profiling  analysis  (Project  3). 

•  Collected,  processed,  and  analyzed  172  MPM  tumor  tissue  specimens  from  patients 
enrolled  in  the  dasatinib  clinical  trial  (Project  4). 

Conclusions 


During  the  second  year,  the  PROSPECT  Pathology  Core  has  achieved  and  exceeded  its  goals 
for  the  second  year  by  prospectively  collecting  frozen  tissue  specimens  from  272  NSCLC  and 
19  MPM  cases,  including  79  NSCLC  cases  treated  with  neo-adjuvant  chemotherapy.  We  have 
expanded  the  MPM  cell  line  repository  to  17  cell  lines.  The  Pathology  Core  has  played  an 
important  role  in  the  processing  of  NSCLC  and  MPM  tissue  specimens  for  profiling,  and  in  the 
characterization  of  tissue  specimens  on  the  expression  of  protein  expression  by 
immunohistochemistry. 


Biostatistics/Bioinformatics  Core 

(Director:  Dr.  J.  Jack  Lee;  Co-Director:  Kevin  Coombes) 

In  close  collaboration  with  the  Pathology  Core  and  each  of  the  five  main  projects,  the 
Biostatistics  and  Data  Management  Core  (BDMC)  for  the  Department  of  Defense  (DoD) 
PROSPECT  lung  cancer  research  program  is  a  comprehensive,  multi-lateral  resource  for 
designing  clinical  and  basic  science  experiments;  developing  and  applying  innovative  statistical 
methodology,  data  acquisition  and  management,  and  statistical  analysis;  and  publishing 
translational  research  generated  by  this  research  proposal.  We  deliver  planned  and  tailored 
statistical  analyses  for  rapid  communication  of  project  results  among  project  investigators,  and 
by  collaborating  with  all  project  investigators  to  facilitate  the  timely  publication  of  scientific 
results. 

The  main  objectives  of  the  Biostatistics  and  Data  Management  Core  are  to: 

1 .  Provide  the  statistical  design,  sample  size,  and  power  calculations  for  each  project. 

2.  Develop  a  secure,  internet-driven,  Web-based  database  application  to  integrate  data 
generated  by  the  five  proposed  projects  and  the  Pathology  Core  of  the  PROSPECT 
research  project. 

3.  Develop  a  comprehensive,  Web-based  database  management  system  for  tissue  specimen 
tracking  and  distribution  and  for  a  central  repository  of  all  biomarker  data. 

4.  Provide  all  statistical  data  analyses,  including  descriptive  analysis,  hypothesis  testing, 
estimation,  and  modeling  of  prospectively  generated  data. 

5.  Provide  prospective  collection,  entry,  quality  control,  and  integration  of  data  for  the  basic 
science,  pre-clinical,  and  clinical  studies  in  the  PROSPECT  grant. 

6.  Provide  study  monitoring  and  conduct  of  the  neoadjuvant  clinical  trial  that  ensures  patient 
safety  by  timely  reporting  of  toxicity  and  interim  analysis  results  to  various  institutional 
review  boards  (IRBs),  the  UTMDACC  data  monitoring  committee,  the  DoD,  and  other 
regulatory  agencies. 
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7.  Generate  statistical  reports  for  all  projects. 

8.  Collaborate  with  all  project  investigators  and  assist  them  in  publishing  scientific  results. 

9.  Develop  and  adapt  innovative  statistical  and  genomic  methods  pertinent  to  biomarker- 
integrated  translational  lung  cancer  studies. 

Summary  of  Research  Findings 

In  the  second  funding  year,  the  BDMC  continued  to  work  with  all  project  investigators  in 
providing  biostatistics  and  data  management  support.  The  accomplishments  are  summarized 
below. 

Biostatistics.  We  worked  with  clinical  investigators  to  provide  the  biostatistical  support  in  the 
development  and  revision  of  PROSPECT  protocols.  We  provided  statistical  reports  on  a 
monthly  basis  to  update  the  accrual,  randomization,  and  demographic  data  for  all  projects 
involved. 

We  have  developed  and  evaluated  the  statistical  methodology  used  for  comparing  various  test 
statistics  for  response  adaptive  randomization  ( BMC  Medical  Research  Methodology).  We  have 
also  placed  emphasis  on  applying  the  Emax  model,  the  interaction  index,  and  the  bivariate  thin 
plate  splines  for  drug  interaction  assessment  in  combination  studies  ( Frontiers  of  Biosciences). 

In  collaboration  with  the  University  of  Texas  Lung  SPORE,  we  continued  to  work  on  developing 
semantic  database  models  for  the  assay  data  being  generated  by  both  the  PROSPECT  projects 
and  the  Lung  SPORE  projects  ( PLoS  ONE,  AMIA  Annu  Symp  Proc). 

We  continue  to  work  on  developing  statistical  methods  for  processing  and  analyzing  the 
reverse-phase  protein  array  (RPPA)  data  that  continue  to  be  generated  as  part  of  the 
PROSPECT  study  of  lung  cancer  (BMC  Bioinformatics,  Bioinformatics). 

We  have  performed  (and  continue  to  perform)  analyses  of  PROSPECT  data.  Although  these 
analyses  have  not  yet  resulted  in  publications,  they  are  expected  to  do  so  in  future  project 
periods.  These  analyses  include: 

1 .  Analysis  of  an  initial  set  of  immunohistochemically  stained  tissue  microarray  data  looking  at 
markers  of  prognosis  in  lung  cancer  samples.  Univariate  analysis  identified  a  number  of 
markers  that  appear  to  be  related  either  to  important  clinical  covariates  or  to  clinically 
relevant  outcomes  (overall  survival,  disease-free  survival,  or  recurrence-free  survival).  We 
are  in  the  process  of  performing  multivariate  analyses  to  identify  robust  signatures  of  these 
outcomes  using  the  same  kinds  of  methods  have  been  developed  in  the  field  of  gene 
expression  microarrays. 

2.  We  have  developed  a  novel  method  to  find  comparative  signatures  of  drug  response  by 
simultaneously  modeling  the  differential  response  of  cell  lines  to  two  different  drugs.  This 
method  was  developed  using  RPPA  data  and  dose  response  data  from  both  lung  cancer 
and  head  and  neck  cancer  cell  lines;  the  lung  cancer  data  was  collected  as  part  of  the 
PROSPECT  grant. 

3.  We  are  preparing  a  statistical  methods  manuscript  that  uses  models  to  evaluate  methods 
that  simultaneously  discover  markers  and  identify  subsets  of  patients  who  receive  greater 
benefit  from  certain  drugs  in  a  multi-arm  clinical  trial. 

4.  We  have  recently  received  a  full  set  of  combined  Affymetrix  gene  expression  data  and 
Agilent  microarray  measurements  of  miRNA  expression.  Analysis  is  underway,  with  the  first 
step  being  to  analyze  each  technology  separately  to  discover  new  prognostic  markers.  At  a 
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later  stage,  we  will  integrate  the  data  by  accounting  for  known  or  predicted  interactions 
between  miRNA  and  mRNA  molecules. 

Data  Management.  The  PROSPECT  database  development  takes  advantage  of  the 
ReVITALization  effort  from  the  DoD-sponsored  VITAL  program  due  to  the  similarity  between  the 
two  databases  developed  for  these  projects.  To  tailor  the  database  for  the  PROSPECT-specific 
needs,  database  extensions  were  made  to  allow  the  collection  and  management  of  data  from 
multiple  studies  including  the  neoadjuvant  studies,  adjuvant  studies,  and  regular  chemotherapy 
studies.  In  addition,  the  PROSPECT  database  was  developed  to  extend  the  ReVITALization 
database  in  VITAL  to  provide  additional  clinical,  pathological,  and  biomarker  data  repositories 
and  tissue  tracking.  In  this  funding  period,  we  continue  our  database  development  effort  and 
make  updates  to  improve  the  function  and  usability  of  the  database. 

The  SQL  Server  2005  database  and  ASP.NET  web  application  is  implemented  with  VB.net 
language.  Queries  and  SQL  2005  reports  are  provided.  Secure  Socket  Layer  (SSL)  and 
secured  database  passwords  are  used  to  keep  data  transaction  protected  and  confidential.  The 
tissue  data  include  clinical  and  pathological  data. 

1)  The  database’s  clinical  module  contains  the  following  Web  forms: 

Patient  Information 

Social  History  (Alcohol  and  Smoking  history) 

Medical  History 
Other  Malignancy 

Treatments  (Surgery,  Chemotherapy,  Radiotherapy  and  Other  Treatments) 

Clinical  Staging 
Follow  up 

2)  The  pathological  module  contains  the  following  Web  forms: 

Primary  and  Metastasis  data  (Diagnosis  and  Surgery  Specimens) 

Histology 

Staging  and  Tumor  Information:  Cancer  staging  (TNM  classification)  is  automatically 
determined  by  the  system  based  on  the  tumor  information  provided. 

Tissue  Bank  (Frozen  Tissue  and  Paraffin) 

3)  Reports:  Several  Excel  reports  are  provided  for  clinical  and  pathological  modules. 

Clinical  Report 
Pathological  Report 
Patient  Report 
Accession  Report 
General  Information  Report 
Other  Malignancy  Report 
Surgery  Report 
Chemotherapy  Report 
Radiotherapy  Report 
Other  T reatment  Report 
Staging  Report 
Follow  up  Report 
Histology  Diagnosis  Report 

4)  Dictionaries:  The  database  gives  control  for  the  users  to  update  dictionaries;  however, 
dictionary  deletion  is  prohibited. 
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Key  Research  Accomplishments 

•  Developed  a  secured,  Web-based  database  application  to  assist  the  study  conduct. 

•  Performed  database  maintenance,  training,  and  support. 

•  Provided  data  integrity  and  data  correction. 

•  Updated  dictionaries  and  added  data  fields. 

•  Updated  project  reports. 

•  Provided  more  links  to  make  data  navigation  easier. 

Reportable  Outcomes 

A  web-based  database  application  is  developed  and  deployed  at: 
https://insidebiostat/DMI  PROSPECT/Common/Loqin.aspx 

Conclusions 


In  collaboration  with  clinical  investigators,  research  nurses,  the  Biomarker  Core,  and  basic 
scientists,  the  Biostatistics  and  Data  Management  Core  has  continued  to  deliver  biostatistics 
and  data  management  support  as  proposed.  Further  support  and  analysis  will  be  provided  in 
the  future  project  period. 
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KEY  RESEARCH  ACCOMPLISHMENTS 

PROJECT  1 

•  Completed  protein  profiling  and  gene  expression  profiling  for  50  NSCLC  cell  lines. 

•  Derived  baseline  gene  expression  signatures  predictive  of  response  by  correlating  mRNA 
expression  with  drug  response. 

•  Derived  proteomic  drug  response  signatures  by  correlating  proteomic  profiles  with  drug 
response  data  for  a  variety  of  drugs. 

•  Using  baseline  proteomic  profiles,  markers  of  radiation  sensitivity  and  resistance  were 
identified  in  lung  cancer  cell  lines  (Yordy  et  al.,  ASTRO  2008;  Yordy  et  al.,  ASCO  2008). 

•  Identified  factors  associated  with  age  and  sex  differences  in  NSCLC  (Herynk  et  al., 
Proceeding  of  the  Flight  Attendants  Medical  Research  Institute, 2009)  (Herynk  et  al., 
Proceedings  of  the  International  Association  for  the  Study  of  Lung  Cancer,  2009). 

•  Identified  SRC  as  a  potential  biomarker  of  response  to  the  EGFR  inhibitor. 

PROJECT  2 

•  Collected  tumor  specimens  on  291  lung  cancer  patients  (including  74  who  had  received 
neoadjuvant  chemotherapy). 

•  Collected  blood  samples  on  283  lung  cancer  patients  (including  64  who  received 
neoadjuvant  chemotherapy). 

•  Performed  preliminary  assessment  of  impact  of  18  biomarkers  on  survival,  and  their 
correlation  with  stage  and  tumor  type. 

PROJECT  3 

•  Performed  extraction  of  DNA  and  RNA  of  over  600  NSCLC  and  53  MPM  with  annotated 
clinicopathologic  information  for  profiling  analysis. 

•  Developed  an  mRNA  prognostic  signature  for  NSCLC  using  FFPE  tissue  specimens. 

•  Performed  mRNA  and  miRNA  molecular  profiling  in  53  MPM  tumor  and  cell  line 
specimens. 

•  Collected  >200  frozen  NSCLC  tissue  specimens  from  patients  who  received  neoadjuvant 
therapy,  and  evaluated  the  pathological  response  to  chemotherapy  in  133  cases. 

•  Characterized  NSCLC  tissue  specimens  for  novel  biomarkers  associated  to  resistance 
to  chemotherapy  in  lung  cancer,  including  Nrf2/Keap1,  membrane  transporters  and  cancer 
stem  cell  markers. 

PROJECT  4 

•  Demonstrated  that  the  Src  Tyr419  biomarker  is  accurately  predicting  for  radiographic 
response  in  patients  receiving  dasatinib  therapy. 

•  Enrolled  14  patients  on  the  clinical  trial. 

•  Contributed  specimens  used  in  construction  of  an  MPM  tissue  microarray  and  172  MPM 
tumor  tissue  specimens  from  the  clinical  trial  to  the  MPM  tissue  bank. 
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PROJECT  5 

•  Optimized  and  validated  QD-staining  conditions  for  multiplexing  three  biomarkers  (EGFR,  E- 
cadherin,  and  p-catenin)  in  both  cell  lines  and  FFPE  tissues. 

•  Developed  a  quantification  method  for  QD  signals  using  the  CRi  Nuance  spectral  system. 

•  Collected  training  set  materials  including  94  cases  of  NSCLC  and  their  adjacent  normal 
tissues,  and  entered  clinical  information  into  a  database  for  further  analysis. 

•  Completed  staining  of  the  three  biomarkers  in  the  94  pairs  of  the  NSCLC  tissues  by  both 
IHC  and  GD-IHF  methods.  The  imaging  and  statistical  analyses  are  ongoing. 

PATHOLOGY  CORE 

•  Collected  prospective  frozen  tissue  specimens  from  272  NSCLC  and  19  MPM  cases, 
including  79  NSCLC  cases  treated  with  neo-adjuvant  chemotherapy. 

•  Established  a  NSCLC  and  MPM  cell  line  repository  at  the  M.  D.  Anderson  Cancer  Center  in 
collaboration  with  Projects  1  and  4. 

•  Performed  extraction  with  detailed  histology  quality  control  of  613  NSCLCs  and  53  MPMs 
tumor  and  corresponding  normal  tissues,  which  will  be  used  for  profiling  analysis  (Project  3). 

•  Collected,  processed,  and  analyzed  172  MPM  tumor  tissue  specimens  from  patients 
enrolled  in  the  dasatinib  clinical  trial  (Project  4). 

BIOSTATISTICS  AND  DATA  MANAGEMENT  CORE 

•  Developed  a  secured,  Web-based  database  application  to  assist  the  study  conduct. 

•  Performed  database  maintenance,  training,  and  support. 

•  Provided  data  integrity  and  data  correction. 

•  Updated  dictionaries  and  added  data  fields. 

•  Updated  project  reports. 

•  Provided  more  links  to  make  data  navigation  easier. 
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REPORTABLE  OUTCOMES 


Publications  (Attached  in  Appendix  A) 

Blanco  R,  Iwakawa  R,  Tang  M,  Kohno  T,  Angulo  B,  Pio  R,  Montuenga  LM,  Minna  JD,  Yokota  J, 
Sanchez-Cespedes  M.  A  gene-alteration  profile  of  human  lung  cancer  cell  lines.  Human 
Mutation.  2009  May  20.  PMID:  19472407. 

Deus  HF,  Stanislaus  R,  Behrens  C,  Wistuba  I,  Minna  JD,  Garner  HR,  Swisher  SG,  Roth  J, 
Correa  A,  Broom  B,  Coombes  K,  Almeida  JS.  Data  driven  semantic  integration  of  translational 
lung  cancer  research  at  M.D.  Anderson  Cancer  Center.  American  Medical  Informatics 
Association  Annual  Symposium  Proceedings.  2008  Nov  6:927.  PMID:  18999102. 

Deus  HF,  Stanislaus  R,  Viega  DF,  Behrens  C,  Wistuba  II,  Minna  JD,  Garner  HR,  Swisher  SG, 
Roth  JA,  Correa  AM,  Broom  B,  Coombes  K,  Chang  A,  Vogel  LH,  Almeida  JS.  A  semantic  web 
management  model  for  integrative  biomedical  informatics.  PLoS  ONE.  2008  Aug  13;3(8):e2946. 
PMCID:  PMC2491554. 

Gazdar  AF,  Minna  JD.  Deregulated  EGFR  signaling  during  lung  cancer  progression:  mutations, 
amplicons,  and  autocrine  loops.  Cancer  Prevention  Research.  2008  Aug;1(3):156-60. 

PMID:  19138950. 

Ji  L,  Roth  JA.  Tumor  suppressor  FUS1  signaling  pathway.  Journal  of  Thoracic  Oncology.  2008 
Apr;3(4):327-30.  PMID:  18379348. 

Huang  DH,  Su  L,  Peng  XH,  Zhang  H,  Khuri  FR,  Shin  DM,  Chen  ZG.  Quantum  dot-based 
quantification  revealed  differences  in  subcellular  localization  of  EGFR  and  E-cadherin  between 
EGFR-TKI  sensitive  and  insensitive  cancer  cells.  Nanotechnology.  2009  Jun  3;20(22):225102. 
PMID:  19433879. 

Larsen  JE,  Spinola  M,  Gazdar  AF,  Minna  JD.  An  overview  of  the  molecular  biology  of  lung 
cancer.  Lung  Cancer:  Principles  and  Practice.  4th  edition.  Philadelphia:  Lippincott  Williams  & 
Wilkins,  2009. 

Neeley  ES,  Kornblau  SM,  Coombes  KR,  Baggerly  KA.  Variable  slope  normalization  of  reverse 
phase  protein  arrays.  Bioinformatics.  2009  Jun  1;25(11):  1384-9.  PMID:  19336447. 

Shames  DS,  Minna  JD.  IP6K2  is  a  client  for  HSP90  and  a  target  for  cancer  therapeutics 
development.  Proceedings  of  the  National  Academy  of  Science  USA.  2008  Feb  5;105(5):1389- 
90.  PMCID:  PMC2234151. 

Sos  ML,  Koker  M,  Weir  BA,  Heynck  S,  Rabinovsky  R,  Zander  T,  Seeger  JM,  Weiss  J,  Fischer  F, 
Frommolt  P,  Michel  K,  Peifer  M,  Mermel  C,  Girard  L,  Peyton  M,  Gazdar  AF,  Minna  JD, 
Garraway  LA,  Kashkar  H,  Pao  W,  Meyerson  M,  Thomas  RK.  PTEN  loss  contributes  to  erlotinib 
resistance  in  EGFR-mutant  lung  cancer  by  activation  of  Akt  and  EGFR.  Cancer  Research.  2009 
Apr  15;69(8):3256-61 .  PMID:  19351834. 

Sos  ML,  Michel  K,  Zander  T,  Weiss  J,  Frommolt  P,  Peifer  M,  Li  D,  Ullrich  R,  Koker  M,  Fischer  F, 
Shimamura  T,  Rauh  D,  Mermel  C,  Fischer  S,  Stuckrath  I,  Heynck  S,  Beroukhim  R,  Lin  W, 
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Winckler  W,  Shah  K,  LaFramboise  T,  Moriarty  WF,  Hanna  M,  Tolosi  L,  Rahnenfuhrer  J, 
Verhaak  R,  Chiang  D,  Getz  G,  Hellmich  M,  Wolf  J,  Girard  L,  Peyton  M,  Weir  BA,  Chen  TH, 
Greulich  H,  Barretina  J,  Shapiro  Gl,  Garraway  LA,  Gazdar  AF,  Minna  JD,  Meyerson  M,  Wong 
KK,  Thomas  RK.  Predicting  drug  susceptibility  of  non-small  cell  lung  cancers  based  on  genetic 
lesions.  Journal  of  Clinical  Investigation.  2009  Jun;1 19(6):1727-40.  doi:  10.1 1 72/JCI371 27. 
PMCID:  PMC2689116. 

Stanislaus  R,  Carey  M,  Deus  HF,  Coombes  KR,  Hennessy  BT,  Mills  GB,  Almeida  JS. 
RPPAML/RIMS:  A  meta  data  format  and  an  information  management  system  for  reverse  phase 
protein  arrays.  BMC  Bioinformatics.  2008  Dec  22;9:555.  PMCID:  PMC2639439. 

Stewart  D,  Issa  JP,  Kurzrock  R,  Nunez  M,  Jelinek  J,  Hong  D,  Yasuhiro  O,  Guo  Z,  Gupta  S, 
Wistuba,  I.  Decitabine  effect  on  tumor  global  DNA  methylation  and  other  parameters  in  a  phase 
I  trial  in  refractory  solid  tumors  and  lymphomas.  Clinical  Cancer  Research.  2009  Jun 
1;15(11):3881-8.  PMID:  19470736. 

Zhang  L,  Wei  Q,  Mao  L,  Liu  W,  Mills  GB,  Coombes  KR.  Serial  dilution  curve:  a  new  method  for 
analysis  of  reverse  phase  protein  array  data.  Bioinformatics.  2009  Mar  1;25(5):650-4.  PMCID: 
PMC2647837. 

Manuscripts  submitted,  in  revision,  or  in  review  (Attached  in  Appendix  A) 

Huang,  D.H.,  Peng,  H.X.,  Su,  L.,  Wang,  D.S.,  Khuri,  F.R.,  Shin,  D.M.,  Chen,  Z.G.,  Optimization 
and  Comparison  of  Multiplexed  Quantum  Dot  Immunohistofluorescence.  Biomaterials. 
Submitted,  2009. 

Imai  H,  Sunaga  N,  Shimizu  Y,  Yanagitani  N,  Kaira  K,  Tomizawa  Y,  Ishizuka  T,  Minna  JD,  Mori 
M.  Overexpression  of  CXCL12  and  its  receptors  CXCR4  and  CXCR7  in  lung  cancer:  CXCL12 
as  a  potential  molecular  target  for  lung  cancer.  Genes  Chromosomes  Cancer.  Submitted,  2009. 

Jayachandran  G,  Roth,  Ji  L.  Analysis  of  Protein-protein  Interaction  using  ProteinChip  Array- 
based  SELDI-TOF  Mass  Spectrometry.  Methods  in  Molecular  Biology.  The  Humana  Press, 
2009.  (in  press). 

Jeong  Y,  Xie  Y,  Xiao  G,  Xie  XJ,  Behrens  C,  Girard  L,  Patz  Jr  EF,  Wistuba  II,  Minna  JD, 
Mangelsdorf  DJ.  Nuclear  Receptor  Expression  Defines  a  Set  of  Prognostic  Biomarkers  for  Lung 
Cancer.  Cancer  Cell .  (Submitted  CANCER-CELL-S-09-00260),  2009. 

Ji  L,  Jayachandran  G,  Roth  J.  High  Throughput  Profiling  of  Serum  Phosphoproteins/peptides 
Using  the  SELDI-TOF-MS  Platform.  Methods  in  Molecular  Biology.  The  Humana  Press,  2009. 
(in  press). 

Kong  M,  Lee  JJ.  Applying  Emax  Model  and  Bivariate  Thin  Plate  Splines  to  Assess  Drug 
Interactions.  Frontiers  of  Biosciences.  In  press,  2009. 

Lee  JJ,  Gu  X  A  simulation  study  for  comparing  testing  statistics  in  response-adaptive 
randomization.  BMC  Medical  Research  Methodology,  (in  revision). 

Lee  JJ,  Lin  HY,  Liu  DD,  Kong  M.  Applying  Emax  model  and  interaction  index  for  assessing  drug 
interaction  in  combination  studies.  Frontiers  of  Biosciences.  In  press,  2009. 
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Ramos  AH,  Dutt  A,  Mermel  C,  Perner  S,  Cho  J,  Lafargue  CJ,  Johnson  LA,  Tanaka  K,  Bass  AJ, 
Barretina  J,  Weir  BA,  Beroukhim  R,  Thomas  RK,  Minna  J,  Chirieac  LR,  Lindeman  Nl,  Beer  DG, 
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Stewart,  DJ.  Non-small  cell  lung  cancer  patient  survival  when  assessed  as  a  first  order 
nonlinear  process:  effect  of  therapy  and  stage.  Submitted  for  publication. 
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patients  with  l-lll  non-small  cell  lung  cancer  (NSCLC).  ASCO  Annual  Meeting,  Orlando,  FL, 
June  2009. 
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CONCLUSIONS 


PROJECT  1:  RPPA  proteomic  profiling  and  gene  expression  profiling  for  a  large  number  of  cell 
lines  was  performed  and  has  provided  the  bases  for  identifying  intracellular  signaling  pathways 
and  proteins  associated  with  sensitivity  and  resistance  to  chemotherapies  and  targeted  agents 
in  NSCLC  cell  lines  and  tumor  samples.  These  profiles  will  allow  for  multiple  biomarker 
analyses.  One  of  the  identified  markers,  SRC-3,  was  found  to  be  correlated  with  resistance  to 
EGFR  inhibitors.  Inhibition  of  SRC-3  in  a  gefitinib-resistant  cell  line  was  able  to  reverse 
resistance  to  the  inhibitor.  These  results  show  that  the  model  is  successful  at  identifying 
relevant  biological  targets  that,  when  inhibited,  are  able  to  reverse  resistance  to  a  targeted 
agent.  Our  findings  will  be  further  investigated  by  correlating  RPPA  of  tumor  samples  with 
clinical  outcomes  in  samples  from  the  BATTLE-1  trial  and  other  clinical  samples  with  the  goal  of 
developing  predictive  markers  that  can  guide  treatment  selection  and  identify  new  targets  in 
NSCLC. 

PROJECT  2:  During  this  project  period,  we  identified  and  are  currently  assessing  the  quality  of 
RNA,  DNA,  and  protein  that  is  available  from  these  tumor  specimens  prior  to  the  full  analysis 
under  PROSPECT.  Specimen  collection  continues  at  a  brisk  pace  and  will  further  our  goal  of 
predicting  future  sites  of  relapse  by  examining  the  molecular  profiles  associated  with  the  patient 
tissues.  Further  analysis  is  needed  to  assess  the  extent  to  which  TTF/gene  expression 
molecular  profile  at  diagnosis  may  help  guide  choice  of  therapies  at  relapse. 

PROJECT  3:  During  the  second  project  period,  we  reached  our  collection  goal  of  NSCLC 
tissues  from  patients  who  received  neoadjuvant  chemotherapy,  and  finalized  the  extraction  of 
DNA  and  RNA  for  molecular  profiling  of  chemo-naive  surgically  resected  NSCLCs.  We  have 
initiated  the  molecular  profiling  of  lung  cancer,  and  developed  an  NSCLC  prognostic  mRNA 
signature  using  FFPE  tissues.  We  are  in  the  process  of  completing  comprehensive  (mRNA, 
miRNA,  DNA  and  protein)  profiling  analyses  of  MPM  tissue  and  cell  line  specimens. 

PROJECT  4:  We  have  demonstrated  that  this  novel  trial  design  is  feasible  and  preliminary 
evidence  suggests  that  our  Src  Tyr419  biomarker  is  accurately  predicting  for  radiographic 
response  in  patients  receiving  dasatinib  therapy.  There  is  a  subpopulation  of  MPM  patients  that 
may  derive  clinical  benefit  from  oral  dasatinib  therapy.  MPM  is  a  very  heterogeneic  tumor,  and 
molecular  profiling  will  be  necessary  in  future  studies  to  ultimately  optimize  targeted  therapy  in 
this  disease. 

Preliminary  evidence  suggests  that  modulation  of  p=Src  Ty419  is  a  feasible,  reasonable 
pharmacodynamic  biomarker  for  dasatinib.  Future  plans  include  correlating  outcome  and  tumor 
p-Src  tyr419  to  peripheral  surrogate  markers  in  blood/serum/platelets  and  pleural  effusion,  and  to 
analyze  pathways  of  resistance  in  MPM  tumors. 

PROJECT  5:  In  the  past  year,  we  completed  the  proposed  cell  line  studies  in  Specific  Aim  1 
and  published  the  results  in  Nanotechnology.  Our  findings  provide  new  biomarkers  and  QD 
methodology  in  predicting  sensitivity  to  EGFR-targeting  therapy  which  can  be  applied  to  tumor 
tissue  specimens  for  clinical  application.  Furthermore,  clarifying  substantial  differences 
between  EGFR-TKI  sensitive  and  insensitive  cancer  cells  will  help  to  understand  the 
mechanism  of  EGFR-targeted  resistance  and  facilitate  the  development  of  new  targeted 
therapies.  During  the  project  period,  we  focused  on  optimization  and  validation  of  a 
quantification  strategy  for  using  QD-based  IHF.  These  studies  provided  a  solid  foundation  for 
analyzing  biomarker  expressions  in  NSCLC  tissues.  Using  this  strategy,  we  have  completed  the 
immunostaining  of  three  biomarkers  -  EGFR,  E-cadherin,  and  p-catenin  -  in  94  pairs  of  the 
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patients’  tissue  samples.  Further  imaging  and  statistical  analyses  of  these  stains  will  answer 
and  important  question  of  whether  quantification  of  multiplex  biomarkers  by  QD-IHF  can  provide 
more  accurate  correlation  to  patient’s  prognosis  and  the  other  relevant  clinical  information  than 
a  signal  biomarker  analysis. 

PATHOLOGY  CORE:  During  the  second  year,  the  PROSPECT  Pathology  Core  has  achieved 
and  exceeded  its  goals  for  the  second  year  by  prospectively  collecting  frozen  tissue  specimens 
from  272  NSCLC  and  19  MPM  cases,  including  79  NSCLC  cases  treated  with  neo-adjuvant 
chemotherapy.  We  have  expanded  the  MPM  cell  line  repository  to  17  cell  lines.  The  Pathology 
Core  has  played  an  important  role  in  the  processing  of  NSCLC  and  MPM  tissue  specimens  for 
profiling,  and  in  the  characterization  of  tissue  specimens  on  the  expression  of  protein 
expression  by  immunohistochemistry. 

BIOSTATISTICS  AND  DATA  MANAGEMENT  CORE:  In  collaboration  with  clinical 
investigators,  research  nurses,  the  Biomarker  Core,  and  basic  scientists,  the  Biostatistics  and 
Data  Management  Core  has  continued  to  deliver  biostatistics  and  data  management  support  as 
proposed.  Further  support  and  analysis  will  be  provided  in  the  future  project  period. 
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ABSTRACT:  Aberrant  proteins  encoded  from  genes  altered 
in  tumors  drive  cancer  development  and  may  also  be 
therapeutic  targets*  Here  we  derived  a  comprehensive 
gene-alteration  profile  of  lung  cancer  cell  lines*  We  tested 
17  genes  in  a  panel  of  88  lung  cancer  cell  lines  and  found 
the  rates  of  alteration  to  be  higher  than  previously  thought* 
Nearly  all  cells  feature  inactivation  at  TP53  and  CDKN2A 
or  RBI,  whereas  BRA F,  MET,  ERBB2,  and  NRAS 
alterations  were  infrequent.  A  preferential  accumulation 
of  alterations  among  histopathological  types  and  a  mutually 
exclusive  occurrence  of  alterations  of  CDKN2A  and  RBI 
as  well  as  of  KRAS,  epidermal  growth  factor  receptor 
(EGFR),  NRAS ,  and  ERBB2  were  seen.  Moreover,  in  non- 
small-cell  lung  cancer  (NSCLC),  concomitant  activation  of 
signal  transduction  pathways  known  to  converge  in 
mammalian  target  of  rapamycin  (mTOR)  was  common. 
Cells  with  single  activation  of  ERBB2,  PTEN,  or  MET 
signaling  showed  greater  sensitivity  to  cell-growth  inhibi¬ 
tion  induced  by  erlotinib,  LY294002,  and  PHA665752, 
respectively,  than  did  cells  featuring  simultaneous  activa¬ 
tion  of  these  pathways,  underlining  the  need  for  combined 
therapeutic  strategies  in  targeted  cancer  treatments.  In 
conclusion,  our  gene-alteration  landscape  of  lung  cancer 
cell  lines  provides  insights  into  how  gene  alterations 
accumulate  and  biological  pathways  interact  in  cancer. 
Hum  Mutat  30,1-8,  2009.  ©  2009  Wiley-Liss,  Inc. 

KEY  WORDS:  lung  cancer;  oncogenes;  tumor  suppres¬ 
sors;  tyrosine  kinase  inhibitors 


Introduction 

Characterization  of  accumulated  genetic  alterations  in  cancer 
cells  is  important  not  only  to  understand  tumor  biology,  but  also 
to  guide  drug  design  and  select  patients  who  might  benefit  from  a 
given  targeted  cancer  therapy.  The  promise  of  using  proteins 

Additional  Supporting  Information  may  be  found  in  the  online  version  of  this  article. 
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encoded  by  mutated  cancer  genes,  mainly  kinases  encoded  by 
oncogenes,  as  molecular  targets  for  the  development  of  novel 
therapies,  drives  endeavors  to  identify  novel  mutated  cancer  genes 
and  to  create  catalogues  of  somatic  mutations  in  cancer  [Wang 
et  al.,  2004;  Sjoblom  et  al.,  2006;  Greenman  et  al.,  2007;  Thomas 
et  al.,  2007].  The  paradigm  of  the  latter  is  the  Catalogue  of 
Somatic  Mutations  in  Cancer  (COSMIC)  database  of  the  Well¬ 
come  Trust  Sanger  Institute  (www.sanger.ac.uk/cosmic)  [Forbes 
et  al.,  2006],  which  brings  together  data  on  the  mutation  status  of 
hundreds  of  cancer-related  genes  in  primary  tumors  and  cancer 
cell  lines  from  a  wide  variety  of  tumor  types. 

In  the  particular  case  of  lung  cancer,  several  gene  alterations  are 
known  to  contribute  to  its  development,  including  activating 
mutations  and  gene  amplification  at  the  oncogenes  BRAF  (MIM# 
164757),  epidermal  growth  factor  receptor  (EGFR)  (MIM# 
131550),  ERBB2  (MIM#  164870),  KRAS  (MIM#  190070),  NRAS 
(MIM#  164790),  PIK3CA  (MIM#  1171834),  MYC  (MIM#  190080), 
MYCL1  (MIM#  164850),  and  MYCN  (MIM#  164840),  as  well  as 
inactivating  intragenic  mutations,  homozygous  deletions,  and 
promoter  hypermethylation  at  the  tumor  suppressor  genes  BRG1/ 
SMARCA4  (MIM#  603254),  LKB1/STK11  (MIM#  602216),  PTEN 
(MIM#  601728),  CDKN2A  (MIM#  600160), 
RBI  (MIM#  180200),  and  TP53  (MIM#  191170)  [Sanchez- 
Cespedes  2007;  Medina  et  al.,  2008].  Some  of  these  gene  alterations 
are  known  to  be  specific  to  lung  tumor  histologies  [Westra  et  al., 
1993;  Otterson  et  al.,  1994;  Kelley  et  al.,  1995;  Sanchez-Cespedes, 
2007;  Medina  et  al.,  2008].  In  addition,  it  is  also  well  established 
that  some  gene  alterations  are  mutually  exclusive,  as  is  the  case  for 
pairs  of  genes,  such  as  KRAS  and  EGFR ,  or  CDKN2A  and  RBI 
[Otterson  et  al.,  1994;  Lynch  et  al.,  2004;  Paez  et  al.,  2004],  that 
encode  proteins  acting  in  the  same  signaling  pathway.  However,  a 
profile  of  alterations  at  multiple  well-known  cancer  genes  in  a  large 
panel  of  lung  cancers  has  never  been  reported.  This  limits  our 
understanding  of  how  gene  alterations  are  distributed  among  lung 
tumors  and  how  they  interact  with  one  another. 

Here,  we  attempt  to  delineate  the  gene- alteration  profile  of  lung 
cancer  cell  lines  by  screening  for  alterations  of  seventeen  well- 
known  cancer  genes,  including  point  mutations  at  AKT1  (MIM# 
164730)  and  EML4-ALK  (MIM#  607442  for  EML4  and  MIM# 
105590  for  ALK)  fusions,  a  small  inversion  within  chromosome  2p 
recently  reported  in  a  small  subset  of  non-small-cell  lung  cancers 
(NSCLCs)  [Carpten  et  al.,  2007;  Soda  et  al.,  2007].  We  examined 
the  association  between  the  genetic  alteration  profile  and  the 
response  to  specific  small  molecule  inhibitors. 


©  2009  WILEY-LISS ,  INC. 


Material  and  Methods 

Cell  Lines 

Cells  were  maintained  in  culture  flasks  in  either  DMEM  (A549, 
NCI-H1299,  NCI-H23,  Calu-3,  NCI-H522,  and  EBC1)  or  RPMI 
1640  (NCI-H446,  NCI-H1650,  NCI-H460,  and  NCI-N417) 
(Invitrogen,  Carlsbad,  CA)  supplemented  with  10%  (v/v)  fetal 
bovine  serum,  2mM  L-glutamine,  50  mg/ml  penicillin/strepto¬ 
mycin,  and  2.5  pg/ml  fungizone.  Cultures  were  kept  at  37°C  in  a 
humidified  atmosphere  of  5%  C02/95%  air.  DNA,  RNA,  and 
protein  were  extracted  using  standard  protocols. 

Screening  for  Gene  Mutations  and  Deletions 

Screening  for  mutations  in  AKT1  (exon  3),  BRAF  (exons  11  and 
15),  MET  (MIMtf  164860)  (exons  16-20),  ERBB2  (exon  20),  EGFR 
(exons  18-21),  NRAS  (codons  12,  13,  and  61),  PIK3CA  (exons  1, 
9,  and  20),  PTEN  (exons  2-9),  and  CDKN2A  (exons  1-3)  was 
performed  by  directly  sequencing  PCR  products  using  primers 
and  conditions  that  have  been  previously  described  [Matsumoto 
et  al.,  2007;  Angulo  et  al.,  2008;  Medina  et  al.,  2008],  or  that  are 
available  upon  request.  Nucleotide  numbering  reflects  cDNA 
numbering  with  + 1  corresponding  to  the  A  of  the  ATG  transition 
initiation  codon  in  the  reference  sequence.  We  considered  the 
presence  of  homozygous  deletions  when  there  was  a  reproducible 
absence  of  PCR  product  of  one  or  more  consecutive  exons.  The 
mutational  status  of  STK11 ,  SMARCA4 ,  KRAS ,  and  TP53  was 
either  determined  for  those  cases  with  incomplete/conflicting 
information  or  gathered  from  previous  publications  [Harbors 
et  al.,  1988;  Yokota  et  al.,  1988;  Otterson  et  al.,  1994;  Shimizu 
et  al.,  1994;  Matsumoto  et  al.,  2007;  Angulo  et  al.,  2008]  (Supp. 
Table  SI)  or  from  the  Wellcome  Trust  Sanger  Institute’s  Cancer 
Cell  Line  Project  website  (www.sanger.ac.uk/cosmic).  In  those 
cases  where  mutation/deletion  data  were  not  available,  cells  with  a 
reported  absence  of  RB  protein  expression  were  classified  as  RB1- 
mutant.  The  presence  of  the  EML4-ALK  fusion  gene  was  tested 
according  to  previously  published  conditions  [Soda  et  al.,  2007]. 

Promoter  Hypermethylation 

The  determination  of  promoter  hypermethylation  at  CDKN2A 
was  evaluated  by  bisulfite  treatment  of  the  genomic  DNA  and 
subsequent  methylation- specific  PCR,  using  previously  published 
protocols  [Esteller  et  al.,  2001]. 

Real-Time  Quantitative  Genomic  PCR  for  Determining 
Gene  Amplification 

To  determine  MET,  ERBB2 ,  MYC,  MYCL ,  and  MYCN  amplification 
we  used  quantitative  real-time  genomic  PCR.  The  conditions  and 
primers  used  for  MYC,  MYCN  and  MYCL  have  been  previously 
described  [Medina  et  al.,  2008].  ERBB2  and  MET  primers  and  PCR 
conditions  are  available  upon  request.  The  copy  number  of  genomic 
DNA  was  measured  by  SYBR  green  using  an  ABI  Prism  7900  Sequence 
Detector  (Applied  Biosystems,  Foster  City,  CA). 

Inhibitors  and  Viability  Assay 

Rapamycin  (mammalian  target  of  rapamycin  [mTOR]  inhibi¬ 
tor)  and  LY-294002  (PI3K  inhibitor)  were  obtained  from 
Calbiochem  (La  Jolla,  CA)  and  PHA665752  (MET  inhibitor) 
from  Tocris  Bioscience  (Ellisville,  MI).  Erlotinib  (N-(3-ethynyl- 


phenyl)-6,7-bis(2-methoxyethoxy)-4-quinazolinamine)  (EGFR  in¬ 
hibitor)  was  a  gift  from  Roche  Pharmaceuticals  (Mannheim, 
Germany).  Erlotinib  tablets  were  ground  to  powder  and  dissolved 
in  pure  dimethyl  sulfoxide  (DMSO)  to  the  desired  concentration. 
For  the  cell-survival  assays,  cells  were  seeded  at  a  density  of  5,000 
cells/well  (15,000  cells/well  for  N417)  on  96-well  plates.  They  were 
allowed  to  recover  for  12  hr  before  adding  the  drugs.  Cells  were 
exposed  to  various  concentrations  of  each  drug  for  48  or  72  hr, 
and  then  the  viable  cell  number  was  measured  by  the  3- (4,5- 
dimethylthiazol-2)-2,5-diphenyltetrazolium  bromide  (MTT)  as¬ 
say.  Briefly,  10  pi  of  a  solution  of  5  mg/ml  MTT  (Sigma  Chemical, 
Zwijndrecht,  The  Netherlands)  was  added  to  each  well.  After 
incubation  for  3  hr  at  37°C,  the  medium  was  discarded,  the 
formed  formazan  crystals  were  dissolved  in  100  pi  DMSO  and 
absorbance  was  determined  at  596  nm  by  means  of  a  microplate 
reader  (Bio-Rad,  Hercules,  CA).  Viabilities  were  expressed  as  a 
percentage  of  the  untreated  controls.  The  50%  growth  inhibition 
(IC50)  was  determined  from  the  dose-response  curve.  Results  are 
presented  as  the  median  of  at  least  two  independent  experiments 
performed  in  triplicate  for  each  cell  line  and  each  compound. 

Antibodies  and  Western  Blot  Analysis 

Anti-phospho-AKT  (S473),  anti-AKT,  anti-S6,  anti-phospho-S6 
(S235/236),  anti-phospho-MET  (Y1234/Y1235),  and  anti-MET 
were  obtained  from  Cell  Signaling  Technology  (Beverly,  MA).  For 
western  blotting,  cells  were  seeded  in  12 -well  culture  plates  and, 
after  incubating  for  24  hr  with  the  designated  drug,  were  scraped 
from  the  dishes  into  lysis  buffer.  Forty  micrograms  (pg)  of  total 
protein  were  separated  by  SDS-PAGE,  transferred  to  a  PVDF 
membrane,  and  blotted  with  the  appropriate  antibody  according  to 
the  manufacturer’s  instructions. 

Results 

Gene  Alteration  Profiles  of  a  Lung  Cancer  Cell  Line  Panel 

To  accurately  determine  the  frequency  of  point  mutations  and 
homozygous/intragenic  deletions  of  known  cancer  genes  in  lung 
cancer,  avoiding  the  masking  effect  of  the  admixture  with 
nonmalignant  cells,  we  chose  to  screen  cancer  cell  lines,  including 
small-cell  lung  cancer  (SCLC),  squamous  cell  carcinomas  (SCC), 
adenocarcinomas  (AC),  large-cell  carcinomas  (LCC),  and  carci¬ 
noids.  Eighty- eight  lung  cancer  cell  lines  were  tested  for  alterations 
at  17  genes:  AKT1 ,  BRAE ■  MET,  EGFR ,  ERBB2 ,  KRAS,  STK11, 
MYC,  MYCL,  MYCN,  NRAS,  PIK3CA,  PTEN,  CDKN2A,  RBI,  and 
TP53,  as  well  as  the  EML4-ALK  fusion.  Alterations  were  present  in 
all  genes  except  AKT1.  The  EML4-ALK  fusion  was  never  detected. 
A  total  of  98%  (86/88)  of  the  cell  lines  had  alterations  of  at  least  at 
one  of  the  genes  tested  (Supp.  Table  SI  and  Supp.  Fig.  SI).  As 
expected,  alterations  in  tumor-suppressor  genes  were  homozygous 
whereas  they  were  often  heterozygous  in  oncogenes.  Although  two 
different  heterozygous  TP53  mutations  were  detected  in  three  cell 
lines,  these  mutations  are  likely  to  have  occurred  in  each  of  both 
alleles  resulting  in  the  complete  and  biallelic  inactivation  of  the 
TP53  gene.  The  frequency  of  alterations  when  considering  all 
histological  types,  from  the  highest  to  the  lowest,  were  ranked  as 
follows:  TP53  (79%),  CDKN2A  (59%),  RBI  (35%),  STK11  (27%), 
MYC-family  (20%),  KRAS  (17%),  PTEN  (11%),  PIK3CA  (8%), 
EGFR  (7%),  NRAS  (6%),  MET  (5%),  BRAF  (2%),  and  ERBB2 
(2%).  The  present  study  does  not  extend  to  mutation  analysis  at 
another  key  tumor-suppressor  gene,  SMARCA4,  which  has 
recently  been  found  to  be  frequently  altered  in  NSCLC  [Medina 
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et  al.,  2008].  Data  on  the  mutation  status  of  SMARCA4  for  some 
cell  lines  is  also  provided  in  Supp.  Table  SI. 

To  determine  possible  cell  culture  artifacts  we  compared  the 
mutational  profile  of  lung  cancer  cell  lines  and  lung  primary  tumors. 
The  mutational  status  of  the  TP 53,  STK11 ,  KRAS ,  PIK3CA ,  EGFR , 
and  BRAF  genes  was  available  for  non-small-cell  lung  primary 
tumors  [Angulo  et  al.,  2008].  The  ranking  of  the  most  commonly 
mutated  genes  in  lung  primary  tumors  ( TP 53  >  KRAS  >ST- 
K11>EGFR>PIK3CA>BRAF)  was  very  similar  to  that  in  cell 
lines.  However,  the  frequency  of  mutations  at  any  gene  in  primary 
tumors  was  about  half  that  in  lung  cancer  cell  lines  (Supp.  Fig.  S2), 
suggesting  a  reduced  effectiveness  in  the  detection  of  gene  alterations 
in  primary  tumors,  probably  due  to  contamination  by  normal  cells. 
Alternatively,  it  is  also  possible  that  primary  tumors  are  more 
heterogeneous  than  cell  lines  with  respect  to  the  accumulated  genetic 
alterations.  Since  there  are  models  for  stepwise  accumulation  of 
genetic  alterations  both  for  lung  AC  and  SCC,  we  can  not 
completely  discard  that  these  differences  arise  as  a  consequence  of 
different  progression  stages  between  the  tumors  and  cell  lines 
analyzed. 

Gene  Alterations  and  Histopathological  Correlations 

The  distribution  of  gene  alterations  among  patient  character¬ 
istics  and  tumor  histopathologies  are  summarized  in  Table  1.  As 
previously  described,  alterations  in  CDKN2A  and  STK11  were 
preferentially  found  in  NSCLC,  whereas  alterations  in  PTEN ,  RBI , 
and  in  the  MYC  family  of  genes,  especially  MYCL  and  MYCN , 
were  more  common  in  SCLC.  It  is  also  interesting  to  note  that 
mutations  at  other  components  of  the  EGFR/KRAS  signal 
transduction  pathway,  i.e.,  EGFR ,  ERBB2 ,  BRAF,  and  NRAS, 
predominate  in  lung  AC.  The  differences  did  not  reach  statistical 
significance  probably  due  to  the  few  number  of  cell  lines  with 
mutations  at  those  genes.  However,  when  combined  together, 
mutations  at  any  of  the  different  components  of  the  KRAS  pathway 
(EGFR,  ERBB2,  KRAS,  NRAS,  and  BRAF)  were  significantly  more 
frequent  in  lung  AC  as  compared  to  SCCs  (P<  0.05;  Fisher’s  exact 
test)  and  in  NSCLC  as  compared  to  SCLC  (P< 0.00005;  Fishers 
exact  test) .  Alterations  at  TP53  were  present  in  a  similar  frequency 
in  both  SCLC  and  NSCLC,  indicating  that  its  inactivation  is 
required  for  the  development  of  all  histopathological  types  of  lung 
cancer.  Although  very  low  frequency,  mutations  at  PIK3CA  were 
also  found  in  NSCLC  and  SCLC.  The  mutations  found  in  the  later 
correspond  to  novel  variants  that  need  verification. 

As  previously  reported,  mutations  at  KRAS  and  EGFR 
predominate  in  tumors  from  Caucasian  and  Asian  patients, 
respectively.  However,  a  new  observation  that  arises  from  our 
study  is  the  accumulation  of  alterations  at  the  MYC-family  of  gene 
in  tumors  from  patients  of  Caucasian  origin  (P<0.05;  Fisher’s 
exact  test).  No  associations  were  detected  between  alterations  at 
any  gene  and  gender,  or  age,  nor  were  gene  alterations  seen  to  have 
accumulated  in  tumors  of  older  patients.  Rather  than  a  definitive 
observation,  the  lack  of  association  between  the  presence  of 
mutations  at  EGFR  and  KRAS  with  tumors  from  nonsmokers  and 
smokers,  respectively,  is  likely  due  to  the  lack  of  information  on 
the  smoking  habit  of  many  the  individuals. 

Identification  of  Novel  Variants 

In  addition  to  well-known  somatic  mutations  with  an 
oncogenic  effect  within  the  helical  and  kinase  domains  of  PIK3CA 
[Samuels  et  al.,  2004;  Gymnopoulos  et  al.,  2007;  Angulo  et  al., 
2008],  we  identified  two  novel  variants,  both  located  near  well- 


characterized  mutation  hotspots.  One  of  these  is  an  insertion  of 
387  nt  after  the  termination  codon  TGA  that  results  in  the 
duplication  of  amino  acids  1,051  to  1,068  (Fig.  IB)  and  the  other 
is  a  p.D1029Y  substitution.  Since  no  matched  normal  DNA  was 
available  for  these  cell  lines,  we  could  not  test  whether  these 
mutations  are  germline  polymorphisms  or  tumor-specific  muta¬ 
tions.  Four  cell  lines  carried  MET  alterations,  including  gene 
amplification  and  two  novel  variants,  p.L1158F  (in  the  HCC15 
cells)  and  p.T1259K  (in  the  H1963  cells)  (Fig.  IB  and  C).  Again, 
due  to  the  lack  of  normal  matched  DNA  for  these  cell  lines  we 
could  not  verify  the  somatic  nature  of  the  amino  acid 
substitutions.  However,  the  absence  of  constitutive  MET  activation 
indicated  by  the  lack  of  pMETY1234/Y1235  in  these  cell  lines  strongly 
argues  against  an  oncogenic  role  for  the  variants  (Fig.  ID).  The 
H441,  Calu3,  HCC366,  and  HCC78  cells  that  were  reported  to 
have  high  levels  of  pMETY1234/Y1235  [Rikova  et  al.,  2007]  did  not 
feature  gene  amplification  or  point  mutations  within  the  hotspots 
tested  here. 

Cooperation  of  Several  Biological  Pathways  in  Lung 
Carcinogenesis 

It  is  widely  accepted  that  alterations  of  genes  in  the  same 
biological  pathways  are  not  redundant  in  cancer  cells.  Accordingly, 
genes  that  are  altered  in  a  mutually  exclusive  manner  are  likely  to 
encode  proteins  that  act  in  the  same  biological  pathway.  This 
hypothesis  has  been  extensively  borne  out  in  lung  cancer  cells  by 
the  lack  of  concomitant  alterations  at  RBI  and  CDKN2A,  and  at 
EGFR  and  KRAS.  Our  data  also  confirm  the  mutually  exclusive 
nature  of  these  pairs  of  alterations  (Fig.  1A).  Likewise,  alterations 
at  ERBB2  and  NRAS  did  not  occur  in  the  same  cell  lines  or  in  cells 
carrying  EGFR  and  KRAS  mutations,  consistent  with  their 
participation  in  the  same  signal  transduction  pathway.  PTEN 
and  PIK3CA,  which  are  both  encoding  proteins  that  modulate  the 
intracellular  levels  of  the  phosphoinositide-3,4,5-trisphosphate 
(PIP3),  were  also  found  to  be  mutated  in  a  mutually  exclusive 
manner.  Only  one  cell  line,  Lul34,  with  a  homozygous  deletion  at 
PTEN,  had  a  concomitant  change  at  PIK3CA.  The  PIK3CA  variant 
is  a  p.D1029Y  substitution,  which  has  not  been  described  before 
and  for  which  there  is  no  evidence  of  its  somatic  nature.  On  the 
other  hand,  there  were  concomitant  BRAF-  and  NRAS-activating 
mutations  in  the  H2087  lung  adenocarcinoma  cells.  The  somatic 
nature  of  the  p.L597V  mutation  in  BRAF  was  confirmed  after 
sequencing  the  DNA  of  the  corresponding  lymphoblastoid  line 
(BL-H2087).  On  the  other  hand,  simultaneous  mutations  in  signal 
transduction  pathways  that  are  known  to  converge  in  the 
modulation  of  mTOR  activity,  such  as  MET,  PIK3CA/PTEN, 
STK11,  and  KRAS/EGFR/NRAS/ERBB2,  were  present  in  some  cell 
lines,  implying  cooperation  in  cancer  development.  Namely,  17 
(28%)  of  the  61  NSCLC  cell  lines  carried  single  mutations, 
whereas  16  (26%)  and  two  (3%)  of  them  carried  double  and  triple 
mutations,  respectively,  in  any  of  this  group  of  genes. 

Correlation  of  Acquired  Genetic  Alterations  With 
Sensitivity  to  Small  Molecule  Inhibitors 

To  understand  a  possible  effect  of  these  genetic  alterations  on 
the  primary  resistance  to  tyrosine  kinase  inhibitors  (TKIs)  and 
other  small  molecule  inhibitors,  we  selected  a  panel  of  10  lung 
cancer  cell  lines  with  a  known  genetic  background  for  KRAS, 
STK11,  EGFR,  PTEN,  PIK3CA,  and  MET,  and  tested  the  sensitivity 
to  treatment  with  inhibitors  of  PI3K  (LY294002),  mTOR 
(rapamycin),  MET  (PHA665752),  and  EGFR  (erlotinib).  As 
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Table  1.  Distribution  of  the  Indicated  Mutations  Among  the  Different  Characteristics  of  the  Lung  Cancer  Cell  Lines 
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aAnalysis  performed  only  for  the  adenocarcinoma  and  large-cell  carcinoma  cell  lines. 
bAnalysis  performed  only  for  the  adenocarcinoma  cell  lines. 

includes  the  following  categories:  one  mesothelioma,  one  carcinoid,  and  one  neuroendocrine. 
*  Asian  vs.  Caucasian  comparison. 

HIST,  histopathological  type. 
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Figure  1.  Gene  alterations  in  lung  cancer  cell  lines.  A:  Profile  of  genes  altered  in  human  lung  cancer  cell  lines.  The  presence  of  alterations  is 
indicated  by  gray  bars.  Black  squares  indicate  no  data.  The  black  lines  in  the  PIK3CA  oncogene  refer  to  the  two  variants  of  unknown  oncogenic 
potential.  The  histopathology  is  also  shown.  B:  PIK3CA  and  MET  variants  in  the  RERF-LC-OK  and  HCC15  cell  lines.  Nucleotide  numbering  reflects 
cDNA  numbering  with  +1  corresponding  to  the  A  of  the  ATG  transition  initiation  codon  in  the  reference  sequence  C:  MET  gene  amplification  in 
lung  cancer  cell  lines  revealed  by  quantitative  PCR.  The  relative  MET  copy  number  was  determined  by  comparison  with  an  unrelated 
control  locus,  MDH2,  on  chromosome  7q11.  Cells  with  ME7“  amplification  are  indicated  with  an  arrow.  D:  Western  blot  anti-phospho-MET 
(pMETY1234/Y1235)  and  anti-MET  (MET)  in  the  indicated  cell  lines.  Constitutive  MET  activation  is  present  in  the  EBC-1  and  Calu-3  cells,  but  not  in 
the  HCC15  and  H1963  cells,  which  carry  gene  variants  of  unknown  biological  significance. 


subrogate  markers  to  test  the  ability  of  the  drug  to  inhibit  its 
target  molecule  we  measured  the  levels  of  pAKTSer473  (for  PI3K 
and  EGFR  inhibitors),  pS6Ser235/236  (for  mTOR  inhibitor),  and 
pMETY1234/Y1235  (for  MET  inhibitor).  The  calculated  IC50  for  the 
different  compounds  is  summarized  in  Figure  2A.  A  marked 
genotype-drug  sensitivity  association  was  observed  for  the  Calu-3 
and  EBC-1  cells,  which  were  highly  responsive  to  growth 
inhibition  triggered  by  erlotinib  and  PHA665752  compounds, 
respectively.  The  effectiveness  of  these  treatments  was  also 
measured  by  their  ability  to  decrease  phosphorylation  at  their 
target  molecules  or  at  downstream  effectors  (Fig.  2B).  We  did  not 
observe  a  low  IC50  in  response  to  treatment  with  PHA665752,  in 
the  HI 963  or  HCC15  cell  lines  (data  not  shown).  These  carry 
amino  acid  substitutions  at  the  tyrosine  kinase  domain  of  MET, 
which  is  further  indication  that  these  variants  are  not  functionally 
significant.  Similarly,  the  Calu-3  cells  that  carry  high  levels  of  MET 
phosphorylation  (Fig.  ID)  but  do  not  exhibit  gene  amplification 
or  mutations  were  insensitive  to  PHA665752.  Interestingly,  the 
H522  cells  evidenced  a  strong  sensitivity  to  PHA665752.  These 
cells  neither  carry  amplification/point  mutations  at  MET  nor 
MET  phosphorylation.  Thus,  the  characterization  of  the  gene 
alterations  underlying  the  sensitivity  of  these  cells  to  MET 
inhibitors  will  be  of  interest.  Although  the  differences  were  not 
as  marked,  we  also  noted  that  sensitivity  to  LY294002,  as  indicated 
by  the  lower  IC50,  was  increased  in  the  H446  and  N417  cell  lines, 
both  of  which  are  PTEN- deficient.  Similarly,  the  lowest  IC50  to 
rapamycin  was  observed  for  the  N417,  H446,  EBC-1,  and  Calu-3 
cells  (Fig.  2A  and  B).  Some  of  these  cells  carry  constitutive 
activation  of  AKT  due  to  the  presence  of  PTEN  inactivation  (the 
N417  and  H446),  or  to  ERBB2  gene  amplification  (Calu-3). 
Intriguingly,  the  triple  mutant  KRAS-STK11-PIK3CA  (H460)  and 
EGFR-PTEN  (HI 650)  cells  were  extremely  resistant  to  rapamycin, 
LY294002,  and  erlotinib.  Thus,  we  investigated  the  effect  of  the 


combined  treatment  with  erlotinib  and  LY294002  on  cell  growth, 
and  found  that  the  addition  of  erlotinib  significantly  increased  the 
efficiency  of  cell-growth  inhibition  of  the  LY294002  compound  in 
HI 650  cells,  but  not  in  H460  cells  (Fig.  3A  and  B). 

Discussion 

We  provide  a  detailed  gene-alteration  profile  of  lung  cancer  cells 
of  distinct  histologies.  In  full  compliance  with  Knudsons  two-hit 
hypothesis  [Knudson,  1971],  mutations  in  tumor  suppressors,  but 
not  in  oncogenes,  were  always  homozygous.  We  also  confirmed 
the  disproportionately  high  frequency  of  occurrence  of  some  gene 
alterations  in  specific  histological  types,  which  probably  reflects 
differences  in  the  cell  type  of  origin.  The  overall  profile  of  genes 
mutated  in  lung  cancer  was  comparable  between  lung  primary 
tumors  and  lung  cancer  cell  lines.  However,  the  frequency  of 
mutations  at  any  gene  was  higher  in  cell  lines,  which  strongly 
implies  a  masking  effect  due  to  the  admixture  of  nonmalignant 
cells  that  hinders  the  detection  of  point  mutations  and  insertions/ 
deletions  in  the  primary  tumors.  This  obstacle  has  been  noted 
before  [Sanchez-Cespedes,  2007;  Thomas  et  al.,  2006]  and  is  a 
significant  problem  that  may  be  solved  by  the  use  of  a  novel 
generation  of  sequencers  [Thomas  et  ah,  2006],  or  by  other 
technical  approaches  like  careful  microdissection  of  tumor  cells. 

TP53  was  the  most  frequently  altered  gene  in  the  lung  cancer 
cell  lines.  Nearly  80%  of  the  cell  lines  carry  alterations  of  this 
tumor  suppressor.  Similarly,  alterations  at  the  cell  cycle  compo¬ 
nents,  either  RB  or  CDKN2A ,  were  also  extremely  common.  The 
high  frequency  of  TP 53  and  CDKN2A/RB1  alterations  in  all 
histopathologies  is  a  demonstration  of  their  important  role  in 
lung  cancer  development.  It  is  tempting  to  speculate  that  TP53 
and  CDKN2A/RB1  inactivation  in  lung  cancer  may  be  universal 
and  are  thus  a  requisite  for  the  evolution  of  lung  tumors. 
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Figure  2.  Genotype  of  the  cell  lines  and  sensitivity  to  specific  inhibitors.  A:  The  IC50  (|iM)  for  each  compound  (RAPA,  rapamycin;  LY,  LY294002,  PHA, 
PHA665752;  and  Erlo,  erlotinib)  is  indicated  within  the  boxes.  Treatments  were  applied  for  72  hr.  B:  Immunoblotting  analysis  depicting  the  decreased 
phosphorylation  of  the  indicated  protein  upon  administering  increasing  concentrations  of  the  compound.  Treatments  were  applied  for  24  hr. 


Figure  3.  Cell-growth  inhibition  upon  administering  combined 
LY294002  and  erlotinib  treatment.  Lines  represent  the  cell  survival 
relative  to  untreated  controls  of  the  MTT  assays  in  the  H1650  and 
H460  cells  treated  with  increasing  concentrations  of  LY294002,  alone 
or  with  5|iM  erlotinib  for  72  hr.  Error  bars  indicate  the  standard 
deviation  of  three  replicates. 

Conversely,  alterations  at  some  oncogenes,  such  as  BRAF,  ERBB2 , 
and  MET ]  were  infrequent. 

It  was  remarkable  the  differences  in  the  activation  of 
components  of  the  KRAS  pathway  among  the  lung  cancer 


histopathologies.  While  alterations  at  any  of  the  BRAF ’,  EGFR , 
ERBB2 ,  KRAS ,  or  NRAS  was  significantly  more  common  in  AC  as 
compared  to  SCC,  virtually  none  of  the  SCLC  carry  alterations  at 
any  of  those  genes.  This  strongly  points  out  towards  completely 
different  mechanisms  of  carcinogenesis  for  NSCLC  and  SCLC  and 
likely  accounts  for  the  distinct  clinical  behavior  of  both  types  of 
lung  cancer. 

Although  mutations  outside  the  hotspots  may  increase  the 
frequency  of  alterations  at  these  genes  to  some  extent,  it  seems 
certain  that  their  contribution  will  be  confined  to  a  small  subset  of 
lung  tumors.  However,  given  that  the  encoded  proteins  are  targets 
for  small  molecule  inhibitors,  the  context  in  which  these 
mutations  arise  (e.g.,  histological  type,  concomitant  mutations 
at  other  genes)  needs  to  be  better  understood.  We  confirmed  the 
lack  of  concomitant  mutations  in  those  genes  encoding  proteins 
acting  in  the  same  biological  pathway,  such  as  CDKN2A/RB1 , 
KRAS/EGFR/ERBB2 ,  and  PIK3CA/PTEN.  Apart  from  these, 
simultaneous  alterations  were  found  in  most  of  the  other  genes. 
Intriguingly,  we  also  found  that  BRAF-NRAS ,  were  genetically 
altered  in  the  same  cells,  suggesting  that  the  collaboration  of  the 
encoded  proteins  affects  the  development  of  the  cancer.  Similarly, 
it  was  previously  reported  that  BRAF  mutations  involving  codons 
other  than  600  or  601  were  highly  likely  to  co-occur  with  a  RAS 
family  mutation  [Thomas  et  al.,  2007].  It  is  interesting  to  note  the 
frequent  concomitant  activation  of  signal  transduction  pathways 
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that  converge  in  the  modulation  of  mTOR  activity  upon  different 
stimuli,  such  as  KRAS/EGFR/ERBB2 ,  PIK3CA/PTEN ,  and  STK11 
[Corradetti  and  Guan,  2006]. 

Selective  small  inhibitors  against  molecules  that  participate  in 
different  signaling  pathways  have  been  approved  or  are  at  various 
stages  of  development  for  clinical  use  in  cancer  patients.  In  this 
new  scenario  of  targeted  therapies,  the  response  to  a  given 
therapeutic  drug  is  likely  to  depend  on  the  genetic  background  of 
the  tumor.  Similarly  to  previous  observations  [McDermott  et  al., 
2007],  our  present  results  show  how  lung  cancer  cells  with  single 
alterations  at  MET ]  PTEN ,  or  ERBB2/EGFR  are  sensitive  to  MET 
(PHA665752),  PI3K  (LY294002),  and  EGFR  (erlotinib)  inhibitors, 
respectively.  However,  this  does  not  hold  true  in  cells  with 
activation  of  multiple  signaling  pathways,  suggesting  that  there  are 
interconnections  among  pathways  that  enable  cells  to  bypass  the 
negative  effects  on  cell  growth  triggered  by  the  small  inhibitor.  We 
found  that  in  the  originally  resistant  EGFR/PTEN  double-mutant 
cells,  erlotinib  sensitized  the  cells  to  the  effect  of  the  LY294002 
compound,  which  suggests  that  the  use  of  drug  combination 
strategies  could  improve  sensitivity  to  specific  therapies.  Current 
efforts  to  understand  the  mechanisms  of  tumor  resistance, 
especially  to  TKIs  in  lung  cancer,  further  support  this  hypothesis 
[Rikova  et  al.,  2007;  Engelman  et  al.,  2007].  Guo  et  al.  [2008] 
reported  that  in  EGFR- mutant  cells  which  are  sensitive  to  EGFR 
inhibitors,  EGFR  drives  other  receptors  tyrosine  kinases  (RTKs) 
and  a  network  of  downstream  signaling  that  collapse  with  drug 
treatment.  In  these  cells,  secondary  drug  resistance  appears 
through  the  generation  of  novel  gene  alterations  at  another 
RTK,  MET,  preventing  such  collapse  and  thus  bypassing  the 
inhibitory  effect  of  the  drug.  Taken  together  these  observations  are 
strong  evidence  that  different  signal  transduction  pathways 
assemble  in  networks,  through  the  use  of  some  common 
components.  Beyond  the  contribution  to  the  understanding  of 
cell  biology,  our  observations  draw  attention  to  the  need  to  stratify 
tumors  according  to  their  genotype  and  histology  and  suggest  that 
the  combination  of  pathway- selective  therapies  will  eventually  be 
required  for  the  treatment  of  many  solid  tumors. 
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Backqround 


The  Challenge 

The  Lung  Cancer  SPORE  at  The  University  Texas  MD 
Anderson  Cancer  Center  and  Southwestern  Medical 
School  requires  the  integration  of  heterogeneous 
multi-institutional  sources  comprising  both 
molecular  and  clinical  data. 

The  Technology 

We  describe  a  novel  method  for  converging  domain 
specific  experimental  ontologies  that  relies  on 
propagating  permissions  in  Resource  Description 
Framework  (RDF)  triplets  rather  than  the  single 
access  point  of  conventional  relational  databases. 

The  challenge  is  addressed  by  combining  Semantic 
Web  data  reposition  with  code  distribution.  The 
S3DB  Core  Model  [1,2]  was  used  to  represent  each 
data  element  on  the  Lung  Cancer  Dataset  as  RDF 
triples. 


Conclusions 


The  use  of  the  traditional  web  1.0  tools  to 
manage  translational  datasets  is  not  appropriate 
as  they  typically  include  not  only  clinical  but 
molecular  data  as  well. 

Using  a  Semantic  Web  management  model  for 
integration  such  as  S3DB,  experimental  data  may 
be  queried  using  Semantic  Web  Technologies 
such  as,  for  example,  SPARQL,  the  query 
language  for  RDF. 
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Software  Design  Patterns 


1  -  SEMANTIC  CORE  MODEL 


Core  data  model  developed  for  S3DB  (supported  by  version  3.0 
onwards).  This  diagram  can  be  read  starting  from  the  most 
fundamental  data  unit,  the  Attribute-Value  pair  (filled  hexagonal 
and  square  symbols).  Each  element  of  the  pair  is  object  of  two 
distinct  triples,  one  describing  the  domain  of  discourse,  the 
Rules,  and  the  other  made  of  Statements  where  that  domain  is 
populated  to  instantiate  relationships  between  entities.  The  latter 
includes  the  actual  Values.  Surrounding  these  two  nuclear 
collection  of  triples,  is  the  resolution  of  Collection  and  its 
instantiation  as  Item  that  define  the  relationship  between  the 
individual  elements  of  Rules  and  Statements.  The  resulting 
structure  is  then  organized  in  Projects  in  such  a  way  that  the 
domain  of  discourse  can  nevertheless  be  shared  with  other 
Projects,  in  the  same  or  in  a  distinct  deployment  of  S3DB. 

Finally,  a  propagation  of  User  permissions  (dashed  line)  is 
defined  such  that  the  distribution  of  the  data  structures  can  be 
traced. 


Generic  interfaces 


llffle- 


What  S3DB  provides  is  a  web  service  for  data 
discovery  that  can  be  accessed  through  a  RESTfull 
API.  Generic  interfaces  and  stand  alone  analytic 
applications  query  the  data  elements  using  a 
SPARQL  endpoint,  available  with  each  deployment 
of  S3DB,  to  perform  queries  that  are  distributed  by 
the  deployments  where  the  data  is  kept. 


"--’Ip1— 

S3DB  Fact  Sheet: 

fcf”  •— 

Availability:  http://s3db.org. 

p 

Source  code:  PHP  (5+);  License:  GNU  GPL. 
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- 

■ 

Downloads:  ~2/day  since  Jan  2008;  Registered  deployments:  248. 
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API:  REST  (Representation  State  Transfer). 

mm 

— 

I/O:  RDF,  XML,  tab-delimited. 

Client  applications:  http://bioinformaticstation.org. 

2  -  PERMISSION  MIGRATION 
AND  DATA  ACCESS 

Distinct  users,  with  identities  (solid  icon)  managed  in 
distinct  S3DB  deployments  (circular  compartments), 
which  they  control  separately,  share  a  distributed 
and  overlapping  data  structure  (arrows  between 
symbols)  that  they  also  manage  independently: 
some  data  elements  are  shared  (mixed  color 
symbols)  others  are  not.  This  will  require  the  identity 
verification  to  propagate  between  deployments  peer- 
to-peer  (P2P,  dotted  lines),  including  to  deployments 
where  neither  user  maintains  an  identity  (dotted 
circular  compartment).  This  is  in  contrast  with  the 
conventional  approach  of  having  distinct  users 
manage  insular  deployments  with  permissions 
managed  at  the  access  point  level. 


3  -  DOMAIN  REPRESENTATION 


Relevant  data  elements  of  the  domain,  such  as  individual 
images  of  the  Tissue  Microarrays  (1)  and  Personal  Health 
Record  (2)  data  are  assigned  to  each  element  of  the  S3DB 
Core  Model  by  the  domain  expert.  The  concepts  of  Sample 
and  Tissue  Microarray,  for  example,  are  assigned  to 
Collections  (red  and  yellow  nodes)  and  the  relationships 
between  two  Collections  or  between  a  Collection  and  an 
attribute  such  as  "Age"  (green  nodes)  are  assigned  to  the 
Rules  (grey  lines).  Elements  that  represent  instances  of 
Collections  are  assigned  to  Items,  for  for  example,  "Patient 
#12345"  is  assigned  to  an  Item  of  the  Collection  "Clinical 
Data".  Finally,  the  value  for  a  given  attribute,  such  as  "Age 
27"  is  assigned  to  a  Statement. 


Abbreviations:  S3DB  -  Simple  Sloppy  Semantic 
Database;  RDF  -  Resource  Description 
Framework;  TCGA-  The  Cancer  Genome  Atlas; 
SPARQL  -  Sparql  RDF  Query  Language 
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Abstract 

Background:  Data,  data  everywhere.  The  diversity  and  magnitude  of  the  data  generated  in  the  Life  Sciences  defies 
automated  articulation  among  complementary  efforts.  The  additional  need  in  this  field  for  managing  property  and  access 
permissions  compounds  the  difficulty  very  significantly.  This  is  particularly  the  case  when  the  integration  involves  multiple 
domains  and  disciplines,  even  more  so  when  it  includes  clinical  and  high  throughput  molecular  data. 

Methodology/Principal  Findings:  The  emergence  of  Semantic  Web  technologies  brings  the  promise  of  meaningful 
interoperation  between  data  and  analysis  resources.  In  this  report  we  identify  a  core  model  for  biomedical  Knowledge 
Engineering  applications  and  demonstrate  how  this  new  technology  can  be  used  to  weave  a  management  model  where 
multiple  intertwined  data  structures  can  be  hosted  and  managed  by  multiple  authorities  in  a  distributed  management 
infrastructure.  Specifically,  the  demonstration  is  performed  by  linking  data  sources  associated  with  the  Lung  Cancer  SPORE 
awarded  to  The  University  of  Texas  MDAnderson  Cancer  Center  at  Houston  and  the  Southwestern  Medical  Center  at  Dallas. 
A  software  prototype,  available  with  open  source  at  www.s3db.org,  was  developed  and  its  proposed  design  has  been  made 
publicly  available  as  an  open  source  instrument  for  shared,  distributed  data  management. 

Conclusions/Significance:  The  Semantic  Web  technologies  have  the  potential  to  addresses  the  need  for  distributed  and 
evolvable  representations  that  are  critical  for  systems  Biology  and  translational  biomedical  research.  As  this  technology  is 
incorporated  into  application  development  we  can  expect  that  both  general  purpose  productivity  software  and  domain 
specific  software  installed  on  our  personal  computers  will  become  increasingly  integrated  with  the  relevant  remote 
resources.  In  this  scenario,  the  acquisition  of  a  new  dataset  should  automatically  trigger  the  delegation  of  its  analysis. 
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Introduction 

Data  management  and  analysis  for  the  life  sciences 

“The  laws  of  Nature  are  written  in  the  language  of 
mathematics”  famously  said  Galileo.  However,  in  recent  years 
efforts  to  analyze  the  increasing  amount  and  diversity  of  data  in 
the  Life  Sciences  has  been  correspondingly  constrained  not  so 
much  by  our  ability  to  read  it  as  by  the  challenge  of  organizing  it. 
The  urgency  of  this  task  and  the  reward  of  even  partial  success  in 
its  accomplishment  have  caused  the  interoperability  between 
diverse  digital  representations  to  take  center  stage  [1—5].  Presently, 


for  those  in  the  Life  Sciences  enticed  by  Galileo’s  pronouncement, 
the  effort  of  collecting  data  is  no  longer  focused  solely  on  field/ 
bench  work.  Instead,  it  often  consists  of  painfully  squeezing  the 
pieces  of  the  systemic  puzzle  from  the  digital  media  where  the  raw 
data  is  held  hostage  [6].  It  is  only  then  that  a  comprehensive 
representation  amenable  to  mathematical  modeling  really  be¬ 
comes  available  [7] .  This  is  not  a  preoccupation  exclusive  to  the 
Life  Sciences.  Integration  of  software  applications  is  also  the 
driving  force  behind  new  information  management  systems 
architectures  that  seek  to  eliminate  the  boundaries  to  interoper¬ 
ability  between  data  and  services.  This  preoccupation  indeed 
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underlies  the  emergence  of  service  oriented  architectures  [8—11], 
even  more  so  in  its  event  driven  dynamic  generalization  [12].  It 
also  underlies  the  development  of  novel  approaches  to  software 
deployment  (Figure  1)  that  juggle  data  structures  between  server 
and  client  applications.  Presently,  a  particularly  popular  design 
pattern  is  the  usage-centric  Web  2.0  [13,14]  which  seeks  a  delicate 
balance  in  the  distribution  of  tasks  between  client  and  server  in 
order  to  diminish  the  perception  of  a  distinction  between  local  and 
remote  computation. 

Semantic  web  technologies  [3,15—21]  represent  the  latest 
installment  of  web  technology  development.  In  what  is  being 
unimaginatively  designated  as  Web  3.0  [22,23],  a  software 
development  design  pattern  is  proposed  where  the  interoperability 
boundaries  between  data  structures,  not  just  between  the  systems 
that  produce  them,  is  set  to  disappear.  The  defining  characteristic 
of  this  environment  is  that  one  can  retrieve  data  and  information 
by  specifying  their  desired  properties  instead  of  explicitly 
(syntactically)  specifying  their  physical  location.  The  desirability 
of  this  design  can  clearly  be  seen  in  systems  in  which  clinical 
records  are  matched  with  high  throughput  molecular  profiles,  each 
of  which  stem  from  very  distinct  environments  and  are  often  the 
object  of  very  different  access  management  regulations. 

Inadequacy  of  conventional  systems  for  Translational 
Research 

On  the  one  hand,  high  throughput  molecular  Biology  core 
facilities  and  improved  medical  record  systems  are  able  to 


document  individual  data  elements  with  increasing  detail.  On 
the  other  hand,  researchers  producing  the  data  and  models  that 
critically  advance  the  understanding  of  biological  phenomena  are 
increasingly  separated  from  their  use  by  the  specialization  inherent 
in  each  of  these  activities.  Consequently,  bridging  between  the 
information  systems  of  basic  research  and  their  clinical  application 
becomes  a  necessary  foundation  for  any  translational  exploits  of 
new  biomedical  knowledge [3,24].  The  alternative,  using  conven¬ 
tional  data  representations  where  the  data  models  cannot  evolve, 
typically  requires  the  biomedical  community  to  complement  the 
data  representation  with  a  clandestine  and  inefficient  flurry  of 
datasets  exchanged  as  spreadsheets  through  email. 

Foundations  for  a  novel  solution 

As  others  before  us  [5],  we  have  argued  previously  for  the  use  of 
semantic  web  formats  as  the  foundation  for  developing  more  flexible 
and  articulated  data  management  and  analytical  bioinformatics 
infrastructures  [20].  A  software  prototype  was  then  produced 
following  those  technical  specifications  to  provide  a  flexible  web- 
based  data  sharing  environment  within  which  a  management  model 
can  be  identified  [24].  In  this  third  report  we  describe  the  resulting 
core  model  supporting  distributed  and  portable  data  representation 
and  management.  In  practice  this  translates  into  a  small  application 
deployed  in  multiple  locations  rather  than  a  large  infrastructure  at  a 
single  central  location.  The  open  source  prototype  application 
described  here  has  been  made  public  [25] .  All  deployments  support  a 
common  data  management  and  analysis  infrastructure  with  no 
constraints  on  the  actual  data  structures  described. 


u 


Figure  1.  Three  generations  of  design  patterns  for  web-based 
applications.  The  original  design  ("1.0")  consists  of  collections  of 
hypertext  documents  that  are  syntactically  (dashed  lines)  interoperable 
(traversing  between  them  by  clicking  on  the  links),  regardless  of  the 
domain  content.  The  user  centric  web  2.0  applications  use  internal 
representations  of  the  external  data  structures.  This  representation  is 
asynchronously  updated  from  the  reference  resources  which  are  now 
free  to  have  a  specialized  interoperation  between  domain  contents.  An 
example  of  this  approach  is  that  followed  by  AJAX-based  interfaces. 
Finally,  the  ongoing  emergence  of  the  semantic  web  promises  to 
produce  service  oriented  systems  that  are  semantically  interoperable 
such  that  the  interface  application  reacts  to  domains  of  knowledge 
specifically.  At  this  level  all  applications  tend  to  be  web-interoperable 
with  peer-to-peer  architectures  complementing  the  client-server  design 
of  wl  .0  and  w2.0. 
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A  very  brief  history  of  data 

The  formatting  of  data  sets  as  portable  text  mirrors  the  same 
three  stages  described  for  web-based  applications  in  Figure  1 .  As 
described  in  Figure  2,  data  representation  has  been  evolving  from 
tabular  text  formats  (“flat  files”),  to  self  described  hierarchical  trees 
of  tags  (extended  markup  languages,  XML),  and  finally  to  the 
subject-predicate-object  triples  of  Resource  Description  Frame¬ 
work  (RDF)  [26].  We  have  been  active  participants  in  these 
transformations  [24,27,28],  and  like  many  others  concluded  that 
in  order  to  bridge  the  fragmentation  between  distinct  data 
structures,  we  needed  to  break  down  the  data  structures 
themselves  [20],  that  is,  to  reduce  the  interoperable  elements  to 
RDF  triples  [29].  In  addition  to  its  directed  labeled  graph  nature, 
RDF  formats  [29]  have  a  second  defining  characteristic:  each  of 
the  three  elements  has  a  Uniform  Resource  Identifier  (URI), 
which,  for  the  purposes  of  this  very  brief  introduction,  can  be 
thought  as  a  unique  locator  capable  of  directing  an  application  to 
the  desired  content  or  service.  It  is  also  interesting  to  note  that  at 
each  level  of  this  three-stage  progression  (Figure  2)  we  find  data 
elements  that  have  “matured”,  that  is,  that  present  a  stable 
representation  which  remains  useful  to  specialized  tools.  When  this 
happens  we  find  that  those  elements  remain  convenient  represen¬ 
tations  preserved  whole  within  more  fragmented  formats.  For 
example,  we  find  no  advantages  in  breaking  down  mzXML[30] 
representations  of  mass  spectrometry  based  proteomics  data. 
Instead,  these  data  structures  are  used  as  objects  of  regular  RDF 
triples.  The  mzXML  proteomics  data  structure  offers  an 
paradigmatic  illustration  of  the  evolution  of  ontologies  as  efforts 
to  standardize  data  formats  [31].  It  would  be  interesting  to 
understand  if  the  lengthy  effort  headed  by  the  Human  Proteomics 
Organization,  HUPO,  to  integrate  it  reflects  the  difficulty  to  justify 
reforming[32]  a  representation  that  remains  useful[33]. 

The  advancement  towards  a  more  abstract,  more  global  and 
more  flexible  representation  of  data  is  by  no  means  unique  to  the 
Life  Sciences.  However,  because  of  the  exceptional  diversity  of 
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Figure  2.  Evolution  of  formats  for  individual  datasets.  Hexagons,  rectangles  and  small  circles  indicate  data  elements,  respectively,  attributes, 
their  values,  and  relations.  First,  flat  file  formats  such  as  fasta  or  the  GeneBank  data  model  were  proposed  to  collect  attribute-value  pairs  about  an 
individual  data  entry.  The  use  of  tagging  by  extended  markup  languages  (XML)  allowed  for  the  embedding  of  additional  detail  and  further  definition 
of  the  nature  of  the  hierarchical  structure  between  data  elements.  More  recently,  the  resource  description  framework  (RDF)  further  generalized  the 
XML  tree  structure  into  that  of  a  network  where  the  relationship  between  resources  (nodes)  is  a  resource  itself.  Furthermore,  the  referencing  of  each 
resource  by  a  unique  identifier  (URI)  implies  that  the  data  elements  can  be  distributed  between  distinct  documents  or  even  locations. 
doi:1 0.1 371  /journal,  pone.0002946.g002 


that  domain’s  fluidity,  the  Life  Sciences  are  where  the  Semantic 
Web  may  find  its  most  interesting  challenge  and  as  well,  hopefully, 
where  it  will  find  its  most  compelling  validation  [15]. 

Mathematics  for  data  models 

It  has  not  been  lost  to  the  swelling  ranks  of  Systems  Biologists 
that  the  reduction  of  data  interoperability  to  the  ternary 
representation  of  relations  [34]  brings  the  topic  solidly  back  to  the 
Galilean  fold  of  Mathematics  as  a  language.  The  reduction  of  data 
structures  to  globally  referenced  dyadic  relations  (functions  of  two 
variables),  such  as  those  of  the  Entity-Relationship  (ER)  model, 
brings  in  rich  feeds  from  the  vein  of  Logic.  In  the  process,  and 
beyond  Galileo’s  horizon,  assigning  a  description  logic  value  [35- 
37]  to  some  RDF  predicates  (for  example,  specifying  that 
something  is  part  of  or,  on  the  contrary,  is  distinct  from  something 
else)  allows  the  definition  of  procedures.  This  further  elaboration 
of  RDF  has  the  potential  to  transform  data  management  into  an 
application  of  knowledge  engineering,  and  more  specifically  of 
artificial  intelligence  (AI).  This  reclassification  reflects  the  dilution 
of  the  distinction  between  data  management  and  data  analysis  that 
is  apparent  even  in  an  introduction  as  brief  as  this  one.  Another 
clear  indication  of  this  transformation  is  that  it  re-ignites  the 
opposition  between  data-driven  and  rule-driven  designs  for 
semantic  web  representation  [38—42],  a  recurring  topic  in  AI.  It 
is  important  to  note  that  the  management  model  proposed  here  is 
orthogonal  to  that  discussion.  Its  purpose  is  solely  to  enable  the 
distribution  [43]  of  a  semantic  data  management  system  that  can 
withstand  changes  in  the  domain  of  discourse,  independently  of 
the  rationale  for  the  changes  themselves. 

Software  engineering  for  Bioinformatics 

This  overview  of  modern  trends  in  integrative  data  management  is 
as  significant  for  what  is  covered  as  for  what  is  missed  —  what 
management  models  should  be  used  to  control  the  generation  and 


transformation  of  the  data  model?  It  is  interesting  to  note  that  the 
management  models  that  associate  access  permissions  with  the 
population  of  a  data  model  have  traditionally  been  the  province  of 
software  engineering.  This  may  at  first  appear  to  be  a  reasonable 
solution.  Since  instances  of  a  data  structure  in  conventional 
databases  are  contained  in  a  defined  digital  media,  permission 
management  is  an  issue  of  access  to  the  system  itself.  However,  this 
ceases  to  be  the  case  with  the  semantic  web  RDF  triples  because  they 
weave  data  structures  that  can  expand  indefinitely  between  multiple 
machines.  Presently,  the  formalisms  to  manage  data  in  the  semantic 
web  realm  are  still  in  the  early  stages  of  development,  notably  by  the 
World  Wide  Web  consortium  (W3C)  SKOS  initiative  (Simple 
Knowledge  Organization  Systems).  This  initiative  recently  issued  a 
call  [44]  for  user  cases  where  good  design  criteria  can  be  abstracted 
and  recommendations  be  issued  on  standard  formats.  As  expect¬ 
ed  [15],  the  Life  Sciences  present  some  of  the  most  convoluted  user 
cases  in  which  a  multitude  of  naive  domain  experts  effectively  need 
to  maintain  data  structures  that  are  as  diverse  and  fluid  as  the 
experimental  evidence  they  describe  [24] . 

Materials  and  Methods 

The  most  extreme  combination  of  heterogeneous  data  struc¬ 
tures  and  the  need  for  very  tight  control  of  access  is  arguably  found 
in  applications  to  Personalized  Medicine,  such  as  those  emerging 
for  cancer  treatment  and  prevention.  At  the  Univ.  Texas 
MD Anderson  Cancer  Center  at  Houston  and  the  Southwestern 
Medical  Center  at  Dallas  we  have  deployed  the  S3DB  semantic 
web  prototype  to  engage  the  community  of  translational 
researchers  of  the  University  of  Texas  Lung  Cancer  SPORE 
[45]  in  identifying  a  suitable  management  model.  This  exercise 
involved  over  one  hundred  researchers  and  close  to  half  a  million 
data  entries,  of  clinical  and  molecular  nature.  Right  at  its  onset 
integrating  access  permissions  in  the  definition  of  the  data  models 
was  identified  as  an  absolute  necessity  by  the  participants,  as 
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anticipated  by  the  SKOS  group.  As  a  consequence,  a  data  driven 
“core  model”,  S3DBcore,  that  accommodates  management 
specifications  as  part  of  data  representation,  was  developed  and 
is  described  here.  The  software  used  is  provided  with  open  source 
at  www.s3db.org.  Only  open  source  tools  were  used  in 
development  of  this  web-based  web-service:  PHP  5  was  used  for 
server  side  programming  and  both  MySQL  and  PostgreSQL  were 
tested  as  the  relational  backbone  for  PHP’s  database  abstraction 
class.  At  the  same  location  detailed  documentation  about  S3DB’s 
Application  Programming  Interface  (API)  is  also  provided. 

Results 

Units  of  representation 

The  most  fundamental  representation  of  data  is  that  of 
attribute-value  (AV)  pairs,  for  example,  <color,”blue”>.  The 
generic  data  management  infrastructure  proposed  here  can  be 
described  as  that  of  encapsulating  AV  pairs  through  the  use  of 
another  fundamental  unit  of  representation,  the  Entity-Relation- 
Entity  model  (ER),  such  as  <sky,  has,  color >.  Each  entity  can 
then  be  associated  with  one  or  more  AV  pairs  using  the  entity- 
attribute-value  EAV  model,  for  example,  <sky,  color,  ”blue”>. 
Fast  forwarding  three  decades  of  computer  science  and  knowledge 
engineering  and  we  reach  the  present  day  development  of  a 
representation  framework  where  each  element  of  the  triple  is  a 
resource  with  a  unique  identifier,  with  the  third  element  of  the 
triple  having  the  option  of  being  a  literal,  that  is,  of  having  an 


actual  value  rather  than  a  placeholder.  This  single  sentence  very 
broadly  describes  the  Resource  Description  Framework  (RDF) 
which  is  at  the  foundation  of  the  ongoing  development  of  the 
Semantic  Web  [29],  just  like  hypertext  (HTML)  was  the  enabling 
format  for  the  original  Web.  It  is  important  to  note  that  the 
evolution  of  representation  formats  typically  takes  place  through 
generalization  of  the  existing  ones.  For  example,  extended  markup 
language-based  files  (XML)  are  still  text  files,  and  RDF  documents 
are  still  XML  structures  (Figure  2).  As  noted  earlier,  this  succession 
is  closely  paralleled  by  refinements  of  software  design  patterns 
(Figure  1).  This  reification  process  is  often  driven  by  the  necessity 
to  maintain  increasingly  complex  data  at  a  simpler  level  of 
representation  where  they  remain  intelligible  for  those  who 
generate  and  use  the  data.  Accordingly,  in  the  next  section  triple 
relations  will  be  weaved  around  the  AV  pair  with  that  exact 
purpose:  to  produce  a  core  model  that  is  simple  enough  to  be 
usable  by  naive  users  that  need  to  interact  with  heterogeneous  data 
hosted  in  a  variety  of  machines  (Figure  3),  yet  sophisticated  enough 
to  support  automated  implementation. 

Weaving  a  distributed  information  management  system 

The  objective  of  this  exercise  is  to  produce  a  data  management 
model  that  can  be  distributed  through  multiple  deployments  of  the 
Database  Management  Systems  (DBMS)  which  implies  a  mecha¬ 
nism  for  migration  access  permissions.  Simultaneously,  this  model 
should  allow  different  domain  experts  to  evolve  their  own  data 
models  without  compromising  pre-existing  data.  Achieving  these 


Figure  3.  Illustration  of  the  desirable  functionality:  distinct  users,  with  identities  (solid  icon)  managed  in  distinct  S3DB 
deployments  (circular  compartments),  which  they  control  separately,  share  a  distributed  and  overlapping  data  structure  (arrows 
between  symbols)  that  they  also  manage  independently:  some  data  elements  are  shared  (mixed  color  symbols)  others  are  not.  This 
will  require  the  identity  verification  to  propagate  between  deployments  peer-to-peer  (P2P,  dotted  lines),  including  to  deployments  where  neither 
user  maintains  an  identity  (dotted  circular  compartment).  This  is  in  contrast  with  the  conventional  approach  of  having  distinct  users  manage  insular 
deployments  with  permissions  managed  at  the  access  point  level. 
doi:1 0.1 371 /journal. pone.0002946.g003 
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two  goals  simultaneously  can  only  be  realized  if  the  proposed 
distributed  system  is  composed  of  node  applications  that  are  not  only 
syntactically  interoperable,  but  also  semantically  transparent.  For  a 
discussion  of  the  absolute  need  for  evolvable  data  models  in  the  Life 
Sciences  see  [24].  That  report  is  also  where  the  DBMS  prototype, 
S3DB,  was  first  introduced  (version  1.0).  Finally,  the  Application 
Programming  Interface  (API)  needs  to  support  the  semantic 
interoperability  in  a  way  that  spans  multiple  deployments 
(Figure  3).  The  data  model  developed  to  achieve  these  goals  is 
described  in  Figure  4. 

A  Core  data  management  model  that  is  universal  and 
distributed 

The  directed  labeled  graph  nature  of  RDF  triples,  coupled  with 
their  reliance  on  unique  identifiers  (as  URIs),  enables  data  structures 
to  be  scattered  between  multiple  machines  while  permitting  different 
domains  of  discourse  to  use  the  same  data  elements  differently. 
However,  those  two  characteristics  alone  do  not  address  the 


management  issue:  how  to  decide  when,  where  and  what  can  be 
viewed,  inserted,  deleted  and  by  whom.  It  is  clear  that  the 
conventional  approach  of  dealing  with  permissions  at  the  level  of 
access  to  the  data  store  is  not  appropriate  to  the  Life  Sciences  [5] 
where  multiple  disciplines  and  facilities  are  contributing  to  a  partially 
overlapping  representation  of  the  system.  It  cannot  be  overstated 
that  this  is  particularly  the  case  when  the  system  is  designed  to  host 
clinical  data.  To  solve  this  problem  we  have  developed  a  core  data 
model  where  membership  and  permission  can  migrate  with  the  data. 
We  have  also  developed  a  prototype  application  to  support  such  a 
distributed  data  management  system  (Figure  3),  which  we  make 
freely  available  with  open  source  [25]. 

Discussion 

The  proposed  core  model  is  detailed  in  Figure  4  and  will  be  now 
discussed  in  more  detail.  This  diagram  is  best  understood 
chronologically,  starting  with  the  very  basic  and  nuclear  collection 
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Figure  4.  Core  model  developed  for  S3DB  (supported  by  version  3.0  onwards).  This  diagram  can  be  read  starting  from  the  most  fundamental 
data  unit,  the  Attribute-Value  pair  (filled  hexagonal  and  square  symbols).  Each  element  of  the  pair  is  object  of  two  distinct  triples,  one  describing  the 
domain  of  discourse,  the  Rules,  and  the  other  made  of  Statements  where  that  domain  is  populated  to  instantiate  relationships  between  entities.  The  latter 
includes  the  actual  Values.  Surrounding  these  two  nuclear  collection  of  triples,  is  the  resolution  of  Collection  and  its  instantiation  as  Item  that  define  the 
relationship  between  the  individual  elements  of  Rules  and  Statements.  The  resulting  structure  is  then  organized  in  Projects  in  such  a  way  that  the  domain 
of  discourse  can  nevertheless  be  shared  with  other  Projects,  in  the  same  or  in  a  distinct  deployment  of  S3DB.  Finally,  a  propagation  of  user  permissions 
(dashed  line)  is  defined  such  that  the  distribution  of  the  data  structures  can  be  traced.  See  text  for  a  more  detailed  description. 
doi:1 0.1 371  /journal,  pone.0002946.g004 


PLoS  ONE  |  www.plosone.org 


5 


August  2008  |  Volume  3  |  Issue  8  |  e2946 


Integrative  Bioinformatics 


of  attribute-value  pairs  and  then  proceeding  to  their  encapsulation 
by  three  consecutive  layers  -  the  semantic  schema,  assignment  of 
membership  and,  finally  the  permission  propagation. 

Schema 

The  first  layer  of  encapsulation  is  the  definition  and  use  of  a 
domain  of  discourse  (elements  in  red  in  Figure  4).  This  was 
achieved  in  typical  RDF  fashion  by  defining  two  sets  of  triples,  one 
defining  a  set  of  rules  and  the  second,  the  statements,  using  them. 
As  discussed  elsewhere  [24],  there  are  good  reasons  to  equip  those 
who  generate  the  data  with  the  tools  to  define  and  manage  their 
own  domains  of  knowledge.  The  ensuing  incubation  of  experi¬ 
mental  ontologies  was  facilitated  by  an  indexing  scheme  that 
mimics  the  use  of  subject,  verb,  object  in  natural  languages.  This 
indexing  is  achieved  by  recognizing  Collections  and  the  Items  they 
contain  as  elements  of  the  two  sets  of  nuclear  triples  {Rules  and 
Statements). 

Organization 

The  second  layer  of  formal  encapsulation  corresponds  to  the 
assignment  of  membership.  This  process  extends  the  designation  of 
Items  in  the  previous  level,  by  assigning  the  Collections  that  contain 
them  and  Rules  that  relate  them  to  Projects  that  are  hosted  by 
individual  Deployments  of  the  prototype  S3DB  application.  In  the 
diagram,  the  membership  dependencies  are  accordingly  labeled  as 
rdfs:sub  Class  Of  [29\.  Note  that  memberships  can  also  be  established 
with  remote  resources  (dotted  lines  in  Figure  4),  that  is,  between 
resources  of  distinct  deployments.  Defining  remote  memberships 
presents  little  dificulty  in  the  RDF  format  because  each  element  of 
the  triple  is  refered  to  by  a  universal  identifier  (a  URI),  unique 
accross  deployments.  On  the  other  hand,  managing  permission  to 
access  the  remote  content  is  a  much  harder  problem,  which  we  will 
address  by  supporting  migration  of  identity.  The  alternative  solution 
to  migration  of  identities  is  migrating  the  contents  along  membership 
lines.  However,  that  was,  unsurprisingly,  found  to  be  objectionable 
by  users  with  a  special  attention  to  privacy  and  confidentiality  issues. 
It  would  also  present  some  logistic  challenges  for  larger  datasets.  In 
contrast,  the  definition  of  a  temporary,  portable,  identity  key  or 
token  needed  for  migration  of  identity  is  typically  incommensurably 
smaller  than  the  content  it  permits  access. 

Permissions 

The  final  layer  of  encapsulation  defines  Users  and  Groups  within 
Deployments  and  controls  their  permissions  to  the  data  (blue  in 
Figure  4).  As  with  rest  of  the  core  model,  the  identification  of 
proposed  management  of  permissions  was  directed  by  user  cases. 
That  exercise  determined  that  user  identities  should  be  maintained 
by  specific  Deployments  of  S3DB  but  also  that  they  may  be 
temporarily  propagated  to  other  deployments.  That  solution, 
illustrated  in  Figure  3,  allows  one  application  to  request  the 
verification  of  an  identity  in  a  remote  deployment,  which  then 
verifies  it  in  the  identity’s  source  deployment  and  assigns  it  a 
temporary  key  or  token,  say,  for  one  hour.  All  that  is  propagated  is 
a  unique  alphanumeric  string,  the  temporary  token,  paired  with 
the  user’s  URI.  No  other  user  information  is  exchanged.  As  a 
consequence,  for  the  remainder  of  the  hour,  the  identification  will 
be  asynchronously  available  in  both  deployments,  which  enables 
the  solution  described  in  Figure  3,  where  a  single  interface  can 
manipulate  multiple  components  of  a  large,  distributed  systems 
level  representation  of  the  target  data.  Interestingly,  because  the 
multiple  deployments  of  S3DB  are  accessed  independently  by 
multiple  deployments  of  various  applications,  the  mode  of 
syntactic  interoperation  is  de  facto  peer-to-peer.  The  propagation 
of  permissions  flows  in  the  sequence  indicated  by  the  dashed  blue 


lines  in  Figure  4.  When  a  permission  level  is  not  defined  for  a 
resource,  say  for  a  Item ,  then  it  is  borrowed  from  the  parent  entity, 
in  this  example,  from  the  corresponding  Collection.  When  there  is  a 
conflict  then  the  most  restrictive  option  is  selected.  For  example  a 
conflict  can  arise  for  a  Statement  which  inherits  permissions  from 
both  Rules  and  Collections.  Another  frequent  example  happens  when 
a  user  belongs  to  multiple  groups  with  distinct  permissions  to  a 
common  target  resource. 

Permission  management  is  a  particularly  thorny  issue  in  life 
sciences  applications  because  of  the  management  of  multiple  data 
provenances.  Relying  on  distributed  hosting  of  the  complementary 
data  sources  compounds  the  management  of  multiple  permissions 
even  further  because  it  also  involves  multiple  permission 
management  systems.  Finally,  permission  management  is  often 
treated  ad  hoc  by  the  management  systems  themselves  where  it  is 
resolved  as  access  permission  to  the  system  as  a  whole  rather  than 
being  specified  in  the  data  representation.  Because  each  source 
often  describes  a  specialized  domain,  it  is  guarded  with 
understandable  zeal.  We  argue  here  that  propagation  of 
permissions  is  the  only  practical  solution  to  determine  how  much 
information  is  to  be  revealed  in  different  contexts.  Consequently, 
whereas  the  relationships  between  the  8  S3DB  entities  (oval 
symbols  in  Figure  4)  are  defined  using  RDF  schema [26]  (RDFS), 
and  their  tagging  uses  the  well  established  Dublin  Core  [46],  the 
permission  propagation  layer  is  a  novel  component  of  the 
proposed  management  model.  In  order  to  respond  to  widest 
range  of  the  user  cases  driving  model  identification,  the 
propagation  was  defined  by  three  parameters,  view,  edit,  and 
use.  Each  of  these  parameters  can  have  three  values,  0,  1  or  2, 
corresponding  to,  respectively,  no  permission,  permission  only  on 
entries  submitted  by  the  user,  and  permission  on  all  entries  of  that 
resource.  Users  and  Groups  (blue  entities  in  Figure  4)  can  have  these 
three  types  of  permissions  on  Projects ,  Collections ,  Rules,  Items  and 
Statements.  Among  those  five  entities,  additional  permissions  can  be 
issued,  for  example,  a  Project  may  have  specific  permissions  on 
Collections  and  Rules.  Collections  may  have  further  permissions  on 
their  Items.  The  same  reasoning,  in  reverse,  establishes  what  should 
happen  when  permission  is  not  specifically  defined  for  a  given 
entity.  For  example,  for  a  Statement  the  permission  would  be 
inherited  from  the  parent  entities,  Item  and  Rule.  If  those  two 
entities  did  not  specify  specific  permissions  for  the  target  statement, 
then  those  are  searched  upstream  (Figure  4)  until  reaching  the 
Project  or  even  Deployment  level.  According  to  this  mechanism,  the 
conventional  role  of  a  system  administrator  corresponds  to  a  user 
with  permissions  222  at  Deployment  level.  It  is  worth  recalling  that 
propagation  of  permissions  between  data  elements  in  distinct 
S3DB  deployments  happens  through  the  sharing  the  membership 
in  external  Collections  and  Rules  (dotted  lines),  not  through 
extending  the  permission  inheritance  beyond  the  local  deploy¬ 
ment.  This  is  not  a  behavior  explicitly  imposed  on  the  distributed 
deployment;  it  emerges  naturally  from  the  fact  that  Rule  sharing 
specifies  a  permission  which,  remote  or  local,  interrupts  the 
permission  inheritance.  In  practice  both  the  user  of  the  interface 
and  the  programmer  using  the  API  can  ignore  the  intricacies  of 
this  process,  which  was  identified  to  be  the  intuitive,  sensible, 
propagation  of  permissions  that  we  found  naive  users  to  expect  in 
user-case  exercises. 

Portability 

This  discussion  would  not  be  complete  without  unveiling  some 
defining  technical  details  about  how  portability  is  addressed  by  this 
design.  So  far  we  have  been  loosely  equating  “unique  identifiers” 
with  the  use  of  Uniform  Resource  Identifiers  (URI).  More 
specifically,  the  right  hand  side  of  Figure  4  includes  a  list  of  eight 
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Figure  5.  Snapshots  of  interfaces  using  S3DB's  API  (Application  Programming  Interface).  These  applications  exemplify  why  the  semantic 
web  designs  can  be  particularly  effective  at  enabling  generic  tools  to  assist  users  in  exploring  data  documenting  very  specific  and  very  complex 
relationships.  Snapshot  A  was  taken  from  S3DB's  web  interface,  which  is  included  in  the  downloadable  package[25].  This  interface  was  developed  to 
assist  in  managing  the  database  model  and,  therefore,  is  centered  on  the  visualization  and  manipulation  of  the  domain  of  discourse,  its  Collections  of 
Items  and  Rules  defining  the  documentation  of  their  relations.  The  application  depicted  on  snapshots  B-D  describe  a  document  management  tool 
S3DBdoc,  freely  available  as  a  Bioinformatics  Station  module  (see  Figure  6).  The  navigation  is  performed  starting  from  the  Project  (C),  then  to  the 
Collection  (B)  and  finally  to  the  editing  of  the  Statements  about  an  Item  (D).  The  snapshot  B  illustrates  an  intermediate  step  in  the  navigation  where 
the  list  of  Items  (in  this  case  samples  assayed  by  tissue  arrays,  for  which  there  is  clinical  information  about  the  donor)  is  being  trimmed  according  to 
the  properties  of  a  distant  entity,  Age  at  Diagnosis,  which  is  a  property  of  the  Clinical  Information  Collection  associated  with  the  sample  that 
originated  the  array  results.  This  interaction  would  have  been  difficult  and  computationally  intensive  to  manage  using  a  relational  architecture.  The 
RDF  formatted  query  result  produced  by  the  API  was  also  visualized  using  a  commercial  tool,  Sentient  Knowledge  Explorer  (IO-lnformatics  Inc), 
shown  in  snapshot  E,  and  by  Welkin,  developed  by  the  digital  inter-operability  SIMILE  project  at  the  Massachusetts  Institute  of  Technology.  See  text 
for  discussion  of  graphic  representations  by  these  tools.  To  protect  patient  confidentiality  some  values  in  snapshots  B  and  D  are  scrambled  and 
numeric  sample  and  patient  identifiers  elsewhere  are  altered. 
doi:1 0.1 371/journal. pone.0002946.g005 
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types  of  locally  unique  identifiers  that  can  be  assigned  to  the  same 
number  of  entities  that  define  the  core  model.  It  is  easy  to  see  how 
this  indexing  can  be  made  globally  unique  by  concatenating  them 
with  the  Deployment's  ID,  itself  unique,  for  example  using  its  URL. 
Indeed  this  is  what  is  supported  by  the  accompanying  prototype 
software,  with  a  generalizing  twist  with  very  significant  conse¬ 
quences:  Did  can  either  be  the  deployment  address  or  anything 
that  indicates  what  that  address  is.  For  example,  it  can  indicate  an 
HTML  document  or  even  an  entry  in  a  database  where  this 
address  is  specified.  More  interestingly,  it  can  also  be  a  simple 
alphanumeric  code  that  is  maintained  at  www.s3db.org  in 
association  with  the  actual  URL  of  the  target  deployment.  The 
flexible  global  indexing  achieved  by  either  scenario  allows  the 
manipulation  of  entire  databases  management  systems  as  portable 
data  structures.  It  also  allows  for  novel  management  solutions 
through  manipulation  of  the  DBMS  logical  structure.  For 
example,  defining  a  Did  as  ‘localhost’  would  have  the  effect  of 
severing  all  logical  connections  to  any  usage  outside  that  of  the 
server  machine.  None  of  these  more  fanciful  configurations  were 
validated  with  the  Lung  Cancer  SPORE  user  community  even  if 
they  are  fully  supported  by  the  accompanying  prototype. 
Nevertheless,  its  possibility  enables  some  interesting  scenarios  for 
data  management  and  indeed  for  Knowledge  Engineering. 

User  Interfaces 

The  ultimate  test  for  a  data  management  model  is  the 
intuitiveness  of  what  it  communicates  through  the  user  inter¬ 
face  [47,48].  The  structure  of  S3DBcore  offers  some  useful 
guidelines  in  this  regard.  The  experimental  values  are  represented 
in  a  combination  of  Items  and  Statements  (Figure  4).  There  are  two 
routes  to  that  endpoint.  One  possibility  is  to  take  the  document 
management  approach  of  navigating  from  Projects  to  Collections , 
then  to  their  Items  and  finally  to  the  Statements.  This  is  the  scenario 
that  will  suit  data  centric  activities  such  as  querying  and  updating 
existing  data  or  inserting  new  data.  A  real,  working  example  of 
how  that  interface  may  look  is  depicted  in  Figure  5-B,  which 
details  an  intermediate  step  between  selecting  a  Project  (Figure  5-B), 
and  identifying  and  manipulating  an  individual  entry  made  of 
multiple  statements  about  an  Item  (Fig.  5-D).  The  mechanism  used 
to  distribute  rich  graphics  applications  and  their  interoperation 
with  S3DB  is  detailed  in  Figure  6.  Another  possibility  is  to  navigate 
from  the  Project  to  the  collection  of  Rules ,  most  likely  represented  as 
a  directed  labeled  graph  network,  and  then  browse  the  Statements  as 
an  instantiation  of  the  Rules ,  exemplified  by  another  snapshot  of  a 
working  application,  Figure  5 -A.  This  application  is  the  standard 
web-based  user  interface  distributed  with  S3DB  package [25]. 
Unlike  the  bookkeeping  approach  of  the  document  centric  model 
(Figure  5-B),  the  rule  centric  view  (Figure  5-A)  is  most  suitable  to 
investigate  the  relationship  between  different  parts  of  the  domain 
of  knowledge  and  to  incubate  [24]  a  more  comprehensive  and 
exact  version  of  the  ontology.  However,  and  this  may  be  the  most 
relevant  point,  since  S3DB’s  API  returns  query  results  as  RDF,  any 
RDF  browser  can  be  used  to  explore  it.  This  point  is  illustrated  in 
figures  5E  and  F  where,  respectively,  a  commercial  semantic  web 
knowledge  explorer  (Sentient,  IO -Informatics  Inc)  and  Welkin,  a 
popular  RDF  browser  developed  at  the  Massachusetts  Institute  of 
Technology,  are  use  to  visualize  the  same  S3DB  Lung  Cancer 
project  depicted  in  Figs.  5A  and  B.  Whereas  the  former  is  designed  as 
a  tool  for  knowledge  discovery,  the  latter  offers  a  global  view  of 
distributed  data  structures.  The  value  of  the  core  model  described  in 
Figure  4  as  a  management  template  for  individual  data  elements  will 
be  apparent  upon  close  inspection  of  Fig.  5E.  The  different  colors, 
automatically  set  by  Sentient  KE,  distinguish  the  core  model  (pink), 
where  permission  management  takes  place,  from  the  instantiation  of 
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Figure  6.  Prototype  infrastructure  for  integrated  data  man¬ 
agement  and  analysis  being  tested  by  the  Univ.  Texas  Lung 
cancer  SPORE.  The  system  is  based  on  two  components,  a  network  of 
universal  semantic  database  servers  and  a  code  distribution  server  that 
delivers  applications  in  response  to  the  use  of  ontology.  Four  distinct 
user  cases  are  represented,  a-d,  which  rely  on  a  combination  of 
download  of  interpreted  code  (green  arrows)  or  direct  access  to  web- 
based  graphic  user  interfaces  or  web-based  API  (blue  arrows,  in  the 
latter  case  using  Representational  State  Transfer,  REST).  The  dotted  lines 
represent  regular  updating  of  the  application,  propagating  improve¬ 
ments  in  the  application  code. 
doi:1 0.1 371/journal,  pone.0002946.g006 


their  entities,  in  yellow.  These  two  layers  describe  the  context  for 
individual  entries  specifying  the  age  at  surgery  of  5  patients.  The 
same  display  includes  access  to  molecular  work  on  tumor  samples,  in 
this  case  using  tissue  arrays  and  DNA  extracts.  The  distinct  domains 
are  therefore  integrated  in  an  interoperable  framework  in  spite  of  the 
fact  that  they  are  maintained,  and  regularly  edited,  by  different 
communities  of  researchers.  As  a  consequence,  the  database  can 
evolve  with  the  diversification  of  data  gathering  methodologies  and 
with  the  advancement  in  understanding  the  underlying  processes.  In 
figure  5F  it  can  be  seen  that  MIT’s  Welkin  RDF  visualizer  easily 
distinguished  the  query  results  as  the  interplay  of  4  collections  of  380 
Statements  about  41  Items  from  5  Collections  related  by  40  Rules.  For 
comparison,  see  Figure  5E  where  one  of  its  Statements  is  labeled 
(describing  that  Age  of  patient  providing  pathology  sample  #90  with 
Clinical  Information  #13646  is  90  years  old),  along  with  the  parent 
entities.  For  examples  of  other  Statements  about  the  same  Item  see 
Fig.  5D.  For  examples  of  other  statements  of  the  same  nature  (about 
the  same  domain),  see  4  statements  listed  at  the  bottom-right  of 
Figure  5E. 

Conclusion 

The  Semantic  Web  [15]  technologies  have  the  potential  to 
addresses  the  need  for  distributed  and  evolvable  representations 
that  are  critical  for  systems  Biology  and  translational  biomedical 
research.  As  this  technology  is  incorporated  into  application 
development  we  can  expect  that  both  general  purpose  productivity 
software  and  domain  specific  software  installed  on  our  personal 
computers  will  become  increasingly  integrated  with  the  relevant 
remote  resources.  In  this  scenario,  the  acquisition  of  a  new  dataset 
should  automatically  trigger  the  delegation  of  its  analysis.  The 
relevance  of  this  achievement  becomes  very  clear  when  we  note  that 
what  prevents  a  new  microarray  result  from  being  of  immediate  use 
to  the  experimental  Biologist  acquiring  it  is  not  the  computational 
capability  of  the  experimentalist’s  machine.  Biostatisticians  do  not 
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necessarily  have  more  powerful  machines  than  molecular  Biologists. 
Moreover,  in  neither  case  is  high  end  computation  expected  to  be 
performed  in  the  client  machine  [8] .  Rather,  once  data  gathering  and 
data  analysis  applications  become  semantically  interoperable,  at  the 
very  least,  those  who  acquire  the  illustrative  microarray  data  should 
expect  their  own  machines  to  automatically  trigger  its  sensible 
analysis  by  background  subtraction,  normalization  and  basic 
multivariate  exploratory  analysis  such  as  dimensionality  reduction 
and  clustering.  As  a  consequence,  the  quantitative  scientist’s  role  can 
be  focused  on  defining  the  sensibility  of  alternative  contexts  of  data 
generation. 

The  consequences  of  semantic  integration  are  just  as  advanta¬ 
geous  for  those  dedicated  to  data  analysis.  Statistical  analysts 
typically  spend  the  majority  of  their  time  parsing  raw  datasets 
rather  than  assessing  the  reasonableness  of  alternative  analytical 
routes.  This  contrasts  with  the  critical  need  to  validate  any  given 
analysis  by  comparing  results  produced  by  alternative  configura¬ 
tions  applied  to  independent  experimental  evidence.  It  is  this  final 
step  that  ultimately  determines  the  sensibility  of  the  data  analysis 
procedures  triggered  by  the  acquisition  of  data.  In  summary,  any 
data  management  and  analysis  system  that  will  scale  for  systems 
level  analysis  in  the  Life  Sciences  has  to  be  semantically 
interoperable  if  automated  validation  is  to  be  attainable. 

In  this  report,  we  have  demonstrated  the  design  of  a  semantic 
web  data  model,  S3DBcore,  capable  of  delivering  the  desired 
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Deregulated  EGFR  Signaling  during  Lung  Cancer  Progression:  Mutations, 
Amplicons,  and  Autocrine  Loops 


Adi  F.  Gazdar  and  John  D.  Minna 

One  or  more  members  of  the  family  of  epidermal  growth 
factor  receptor  (EGFR)  genes  are  overexpressed  or  otherwise 
deregulated  in  virtually  all  epithelial  tumors,  including  non¬ 
small  cell  lung  cancers  (NSCLC).  This  and  related  observa¬ 
tions  on  the  importance  of  protein  phosphorylation  and  the 
discovery  that  the  first  identified  oncogene,  v-Src,  is  a  protein 
kinase  led  John  Mendelsohn  and  Gordon  Sato  to  select  EGFR 
as  the  first  target  of  molecular  targeted  therapy  more  than 
20  years  ago  (1,  2).  EGFR  family  members  are  deregulated 
in  cancers  by  the  following  three  fundamental  mechanisms: 
activating  gene  mutations,  increased  gene  copy  number  (via 
amplification  or  polysomy),  and  altered  ligand  expression 
(with  possible  formation  of  autocrine  loops;  ref.  3).  Two  re¬ 
ports  in  this  issue  of  the  journal  advance  our  understanding 
of  the  role  of  all  three  mechanisms  in  the  pathogenesis  and 
progression  of  NSCLC  (4,  5).  Before  discussing  these  reports, 
however,  we  will  present  background  information  on  EGFR 
signaling  and  its  deregulation  in  cancers. 

Reversible  protein  phosphorylation  as  a  crucial  regulator 
of  many  essential  cell  functions  has  been  elucidated  over 
the  past  50  years.  A  superfamily  of  more  than  500  highly 
conserved  protein  kinase  genes  contains  about  2%  of  the  gen¬ 
ome  (6).  Specific  kinases  phosphorylate  serine /threonine  or 
tyrosine  residues  or  have  dual  specificity.  The  tyrosine  ki¬ 
nases,  which  catalyze  the  transfer  of  y  phosphate  of  ATP 
to  tyrosine  residues  on  protein  substrates,  fall  into  two 
classes:  transmembrane  receptors  (receptor  tyrosine  kinase) 
and  nonreceptors.  Subclass  I  of  the  receptor  tyrosine  kinases 
is  the  EGFR  family,  which  consists  of  four  members:  EGFR 
(or  EGFR1,  ERBB2,  HER1),  EGFR2  (or  ERBB2r  HER2),  EGFR3 
(or  ERBB3,  HER3)r  and  EGFR4  (or  ERBB4 ,  HER4 ;  ref.  3). 
Receptor-ligand  interaction  results  in  formation  of  homodi¬ 
mers  or  heterodimers  (between  family  members),  activation 
of  the  intrinsic  kinase  domain,  and  phosphorylation  of  speci¬ 
fic  tyrosine  residues  in  the  cytoplasmic  tail  of  the  receptor. 
The  phosphorylated  residues  become  docking  sites  for  multi¬ 
ple  proteins,  which  in  turn  activate  downstream  signaling 
pathways  including  the  PI3K/AKT  prosurvival,  STAT  tran¬ 
scription,  and  RAS/RAF/MEK  proliferation  pathways. 

Eleven  members  of  the  EGF  family  have  been  identified  as 
ligands  for  the  EGFR  family.  HER2  is  not  ligand  activated  be- 
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cause  of  its  unique  extracellular  spatial  structure  but  is  the  pre¬ 
ferred  dimerization  partner  for  other  family  members;  its 
heterodimers  preferentially  enhance  ligand  binding  (7). 
EGFR3  is  "kinase  dead"  (i.e.,  it  lacks  intrinsic  kinase  activity) 
and,  as  with  HER2,  functions  via  heterodimerization.  The  EGF 
ligands  show  specificity  for  multiple  homodimers  or  heterodi¬ 
mers  (7).  Epiregulin  is  a  pan-EGFR  family  ligand  that  prefer¬ 
entially  activates  heterodimeric  receptor  complexes  (8).  The 
EGF  ligands  are  produced  as  transmembrane  precursors  that 
are  cleaved  into  their  soluble  forms  by  proteases  ("shed- 
dases")  of  the  ADAM  family  (especially  ADAM10  and 
AD  AMI  7)  or  by  matrix  metalloproteinases,  a  process  known 
as  ectodomain  shedding  (9).  Other  receptor  pathways  also 
may  activate  EGFR  signaling  by  activating  the  EGFR  pathway 
via  "cross  talk"  and/or  "transactivation."  An  important  new 
example  of  this  with  relevance  to  EGFR  is  the  inflammatory 
cytokine  interleukin-6,  which  activates  the  Janus-activated 
kinase/ signal  transducer  and  activator  of  transcription  sys¬ 
tem,  which  in  turn  activates  EGFR  pathway  signaling.  High 
levels  of  interleukin-6  have  been  described  in  many  cancers, 
including  EGFR-mutant  lung  cancers,  providing  an  additional 
method  for  EGFR  activation  and  a  new  therapeutic  target. 

NSCLC  cells  can  produce  and  release  several  of  the  EGF 
ligands  (10-12).  Under  certain  circumstances,  the  mem¬ 
brane-anchored  isoforms  and  soluble  growth  factors  also 
may  act  as  biologically  active  ligands.  Therefore,  depending 
on  the  circumstances,  these  ligands  may  induce  juxtacrine, 
autocrine,  paracrine,  and/or  endocrine  signaling  (13).  Estab¬ 
lishing  EGFR  autocrine  loops  renders  the  cells  sensitive  to 
inhibition  by  tyrosine  kinase  inhibitors  (10,  12).  Zhou  et  al. 
(14)  described  the  presence  of  an  autocrine  heregulin-EGFR3 
loop  associated  with  up-regulation  of  the  sheddase 
ADAM10.  Inhibiting  ADAM10  with  a  specific  inhibitor  pre¬ 
vented  the  processing  and  activation  of  multiple  EGF  li¬ 
gands.  Recent  reports  indicate  that  breast  and  NSCLC  cells 
(especially  those  with  EGFR  mutations)  may  produce  large 
amounts  of  interleukin-6,  activating  another  autocrine  loop 
that  drives  tumorigenesis  (15,  16). 

Mutations  of  EGFR  may  target  many  regions  of  the  gene, 
especially  the  extracellular  domain  in  glioblastomas  (17)  and 
the  kinase  domain  in  lung  cancers  (18,  19).  EGFR  mutations 
may  play  a  major  role  in  lung  tumorigenesis  but  also  leave 
lung  tumor  cells  dependent  on  EGFR  signaling  pathway  acti¬ 
vation  for  growth  and  survival  ("oncogene  addiction";  refs. 
19,  20).  Therefore,  inhibition  of  EGFR  signaling  by  tyrosine  ki¬ 
nase  inhibitors  rapidly  leads  to  apoptosis  and  growth  cessa¬ 
tion.  In  the  4  years  since  the  discovery  of  the  mutations, 
however,  it  was  realized  that  primary  tumor  response  and  re¬ 
sistance  to  tyrosine  kinase  inhibitors  are  influenced  by  many 
factors,  including  mutations,  mutation  type,  and  copy  num¬ 
bers  of  EGFR ;  EGFR3  activation;  KRAS  mutations;  MET  ampli¬ 
fication,  and  others  (21-23).  Therefore,  although  some  studies 
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(usually  from  single  institutions  analyzing  highly  selected 
patient  populations)  have  shown  very  high  response  rates 
of  EGFR-mutant  tumors  to  tyrosine  kinase  inhibitors,  large 
multi-institutional  clinical  trials  have  often  failed  to  show  a 
survival  benefit  of  this  approach,  although  increased  copy 
number  of  EGFR  (and  HER2  in  some  series)  was  associated 
with  a  good  treatment  outcome  (24,  25).  Although  EGFR 
mutations  and  copy  number  gains  may  occur  independently, 
they  occur  together  more  frequently  than  alone  (26,  27).  In 
addition,  as  with  glioblastomas  (17),  the  mutant  allele  is  pre¬ 
ferentially  amplified  in  such  cases  (26).  Therefore,  "triple 
whammy"  tumors  (i.e.,  those  with  mutations,  copy  number 
gains,  and  mutant  allele-specific  amplifications)  are  in  all 
probability  highly  oncogene  addicted  and  likely  to  show  dra¬ 
matic  and  sustained  responses  to  appropriate  targeted  thera¬ 
pies.  Autocrine  loops  and  other  derangements  of  EGFR 
signaling  are  frequent  in  all  forms  of  NSCLC,  which  therefore 
may  involve  tumors  with  more  than  three  EGFR  aberrations, 
or  "multiple  whammy"  tumors. 

The  finding  that  all  of  these  different  mechanisms  activate 
EGFR  signaling  in  lung  cancers  signifies  the  presence  and 
great  importance  of  strong  selective  pressures  on  the  EGFR 
signaling  pathway  in  these  cancers.  This  selectivity  was  dra¬ 
matically  highlighted  by  the  finding  of  EGFR  tyrosine  kinase 
domain  mutations,  but  lung  cancer  use  of  all  these  alternative 
mechanisms  is  equally  important  in  underscoring  the  key  role 
of  the  EGFR  pathway  in  driving  lung  cancer  pathogenesis.  Of 
course,  these  findings  also  highlight  how  versatile  tumor  cells 
are  in  finding  ways  to  activate  the  pathway.  On  a  related  note, 
the  relapse  and  subsequent  drug  resistance  of  lung  cancers 
that  had  responded  to  EGFR-targeting  drugs  (such  as  EGFR 
tyrosine  kinase  inhibitors)  show  the  resourcefulness  of  these 
cancers  in  finding  other  ways  to  use  the  EGFR  or  other  path¬ 
ways  (e.g.,  KRAS ,  c-MET)  to  ward  off  extinction.  Relapse  and 
resistance  also  highlight  the  need  for  tools  that  can  determine 
whether  the  pathway  is  active  in  and  identify  "sensitive"  ther¬ 
apeutic  target(s)  for  individual  lung  cancers.  It  is  also  impor¬ 
tant  to  realize  that  the  target  is  constantly  changing,  and  thus 
different  therapeutic  options  are  needed  at  different  disease 
stages. 

We  now  evaluate  the  contributions  of  the  articles  by  Zhang 
et  al.  (4)  and  Tang  et  al.  (5)  in  the  context  of  the  EGFR  signal¬ 
ing  background  detailed  above.  Lung  cancer  has  a  high  mor¬ 
tality  that  usually  is  due  to  the  development  of  metastatic 
lesions.  Although  relatively  few  studies  have  directly  com¬ 
pared  the  molecular  changes  in  primary  tumors  with  those 
in  corresponding  metastatic  tumors,  the  metastatic  phenotype 
is  characterized  by  changes  in  multiple  cellular  pathways  (28). 
The  study  by  Zhang  et  al.  (4)  was  stimulated  by  previous 
work  from  their  laboratory  showing  that  epiregulin  is  one  of 
the  several  highly  expressed  EGF  ligands  in  EGFR-mutant 
NSCLC  cells  (10).  This  group  tested  the  hypothesis  that  epir¬ 
egulin  is  involved  in  the  development  of  the  metastatic  phe¬ 
notype.  Immunostaining  studies  confirmed  their  previous 
observation  that  primary  NSCLC  tumors  with  localized  dis¬ 
ease  stages  frequently  (in  65%  of  cases)  expressed  the  ligand. 
They  reported  a  significant  correlation  between  ligand  expres¬ 
sion  and  advanced  nodal  stage  (stage  II)  and  a  trend  toward 
shorter  survival.  In  vitro  studies  confirmed  the  role  of  epiregu¬ 
lin  in  promoting  tumor  growth  and  invasion.  These  analyses 
show  a  clear  role  for  epiregulin  in  tumor  cell  survival,  inva¬ 


sion,  and  metastasis.  Because  epiregulin  can  stimulate  multi¬ 
ple  members  of  the  EGFR  receptor  family,  activation  of  both 
EGFR  and  EGFR3  signaling  may  contribute  to  carcinogenesis. 
Because  ligand  expression  is  much  more  frequent  than  are 
EGFR  mutations  or  copy  number  gains,  these  findings  provide 
further  evidence  that  autocrine  loops  may  be  the  major 
mechanism  by  which  EGRF  signaling  is  deregulated  in  all  his¬ 
tologic  forms  of  NSCLC.  Future  studies  should  comprehen¬ 
sively  analyze  all  11  EGF  ligands  found  in  lung  cancers 
because  other  members  of  this  ligand  group  may  have  similar 
tumor-promoting  actions. 

As  mentioned  earlier,  EGFR  mutations  and  copy  gains  occur 
frequently  in  the  same  tumors.  Previous  studies  have  shown 
widespread  field  effects  throughout  the  respiratory  epithelium 
of  smokers  (29, 30),  suggesting  that  tobacco  exposure  damages 
the  entire  respiratory  epithelium.  Most  EGFR  mutations  occur 
in  lung  cancers  of  lifetime  never  smokers,  which  have  a  largely 
unknown  etiology  (31).  In  their  earlier  work,  the  authors  care¬ 
fully  microdissected  histologically  normal  respiratory  epithe¬ 
lium  from  small  airways  surrounding  mutation-containing 
tumors  (32);  often  present  in  airways  within  or  near  the  tumor 
but  seldom  in  distant  sites,  the  mutations  reflected  a  limited 
field  effect.  Therefore,  exposure  and  damage  seem  to  be  much 
more  limited  in  never  smokers  than  in  current  or  former 
smokers.  In  their  present  study,  Tang  et  al.  conducted  a  more 
extensive  field  study,  assessing  the  presence  of  mutations  and 
copy  number  gains  (by  fluorescence  in  situ  hybridization 
technique)  in  primary  NSCLC,  corresponding  metastases, 
and  histologically  normal  respiratory  epithelium.  As  in  their 
previous  study,  mutations  and  EGFR  protein  overexpression 
were  a  localized  field  effect.  The  key  present  findings  are  that 
copy  number  gains  were  absent  in  normal  epithelium  and 
were  distributed  heterogeneously  in  primary  tumors  and  more 
evenly  in  metastases.  Tang  et  al.  (5)  have  answered  the  ques¬ 
tion,  "Which  came  first,  the  chicken  (copy  number  gains)  or 
the  egg  (mutations)?"  The  finding  of  mutant  allele-specific 
gains  gives  the  nod  to  the  egg. 

The  prototype  EGFR  gene  is  not  the  only  EGFR  pathway 
gene  amplified  in  NSCLC.  A  recent  report  describes  amplifica¬ 
tion  of  other  pathway  members  including  HER2,  SHC1,  and 
AKT  (33).  Our  unpublished  work  indicates  that  other  pathway 
genes  including  KRAS  and  BRAF  may  also  be  amplified  in 
NSCLC.  Although  mutations  of  pathway  genes  are  usually 
mutually  exclusive,  single  tumors  may  contain  copy  number 
gains  for  multiple  genes  or  a  single  pathway  mutation  and  one 
or  more  pathway  gene  copy  number  gains.1 

Two  other  recently  published  studies  (34,  35)  are  consistent 
with  the  findings  of  Tang  et  al.  (5).  Cancers  arise  as  a  result  of 
multistage  processes,  and  a  lesion  known  as  atypical  adeno¬ 
matous  hyperplasia  is  recognized  as  a  precursor  or  premalig- 
nant  lesion  for  peripheral  lung  adenocarcinomas.  Atypical 
adenomatous  hyperplasia  lesions  progress  to  noninvasive 
cancers  known  as  bronchioloalveolar  carcinomas  as  defined 
by  the  strict  criteria  of  WHO  classification  (36).  Bronchioloal¬ 
veolar  carcinoma  tumors  may  become  invasive  and  eventually 
metastatic.  Early  invasive  cancers  may  contain  invasive  and 
noninvasive  components  that  can  be  microdissected  and  ex¬ 
amined  separately.  By  examining  the  various  stages  of  lung 


1  Unpublished  data. 
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pathogenesis  for  EGFR  mutations  and  copy  gains,  both  re¬ 
ports  (34,  35)  conclude  that  mutations  are  early,  preinvasive 
changes,  whereas  copy  number  gains  are  later  events  asso¬ 
ciated  with  the  invasive  phenotype  (Fig.  1). 

All  of  these  findings  are  consistent  with  the  hypothesis  that 
mutations  precede  copy  number  gains,  which  may  be  asso¬ 
ciated  with  the  metastatic  phenotype.  Therefore,  mutations 
are  likely  to  show  little  or  no  heterogeneity  in  primary  or 
metastatic  tumors,  and  copy  number  gains  may  be  absent  or 


heterogeneously  distributed  in  primary  tumors  and  relatively 
evenly  distributed  within  metastatic  sites.  Further  work  will 
be  needed  to  confirm  that  copy  number  gains  are  part  of  the 
metastatic  phenotype. 

What  are  the  clinical  implications  of  these  findings?  The 
data  of  Zhang  et  al.  (4)  suggest  that  about  two  thirds  of  all 
NSCLCs  express  at  least  one  of  the  EGF  ligands.  Testing  the 
expression  of  the  other  10  known  ligands  in  this  cohort  pre¬ 
sumably  would  have  shown  an  even  higher  percentage.  The 
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Fig.  1.  Deregulation  of  the  EGFR  gene  during  the  multistage  pathogenesis  of  peripheral  lung  adenocarcinomas.  A,  peripheral  adenocarcinomas  are  believed  to  arise 
from  preneoplastic  lesions  known  as  atypical  adenomatous  hyperplasias  [AAH),  which  first  progress  to  a  preinvasive  neoplastic  stage  called  bronchioloalveolar 
carcinoma  [BAC).  Foci  of  invasion  may  develop  in  the  fibrotic  centers  of  bronchioloalveolar  carcinomas,  which  then  are  called  invasive  adenocarcinomas,  although 
noninvasive  elements  may  persist  at  the  edges  of  the  tumors.  Metastases  ultimately  develop  (not  shown).  B,  from  the  article  by  Tang  et  al.  (5)  and  from  the  literature 
cited  in  the  text,  EGFR  mutations  commence  early  during  pathogenesis  and  can  be  detected  in  histologically  normal  respiratory  epithelium  near  tumors  (localized  field 
effect).  Mutations  are  more  frequent  in  preneoplastic  (atypical  adenomatous  hyperplasia)  and  preinvasive  (bronchioloalveolar  carcinoma)  stages  than  in  normal 
epithelium.  Therefore,  there  is  relatively  little  heterogeneity  of  mutations  in  invasive  carcinomas,  and  the  mutations  contribute  to  tumor  pathogenesis.  In  contrast,  gene 
copy  number  gains,  often  in  the  form  of  amplifications,  commence  relatively  late  in  pathogenesis,  usually  at  the  tumor  stage.  They  are  more  frequent  in  metastatic 
lesions,  suggesting  that  they  may  be  progression  events  involved  in  the  metastatic  phenotype.  Much  less  is  known  about  the  timing  of  epiregulin  loops  (either 
autocrine,  paracrine,  or  juxtacrine).  From  the  data  of  Zhang  et  al.  (4),  however,  it  would  seem  that  epiregulin  loops  can  be  detected  in  primary  invasive  tumors  but  are 
more  frequent  or  active  during  the  metastatic  stage.  The  dashed  line  indicates  that  the  timing  of  the  appearance  of  these  loops  during  earlier  preinvasive  stages  is 
unknown. 
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expression  of  EGFR  protein  in  most  NSCLCs,  including  squa¬ 
mous  cell  carcinomas,  raises  the  question  of  what  mechanism 
causes  deregulation.  Mutations  and  copy  number  gains 
explain  only  a  minority  of  these  cases  and  probably  are  not 
important  mechanisms  in  squamous  cell  carcinomas.  As  sug¬ 
gested  by  the  data  of  Zhang  et  al.  (4),  activation  of  autocrine 
(or  paracrine  or  juxtacrine)  loops  is  an  attractive  alternative 
mechanism.  If  this  loop  is  dependent  on  continued  EGFR  sig¬ 
naling  and  is  inhibited  by  tyrosine  kinase  inhibitor  therapy,  as 
suggested  by  the  data,  this  would  be  a  plausible  explanation 
for  why  some  nonmutant  tumors  of  all  histologic  types  with 
nearly  diploid  copy  number  respond  to  tyrosine  kinase  inhibi¬ 
tor  therapy  (24,  37).  Future  retrospective  and  prospective  stu¬ 
dies  are  needed  to  determine  whether  EGF  ligand  expression 
is  an  additional  predictive  factor  for  tyrosine  kinase  inhibitor 
response.  The  concept  that  the  driving  force  behind  many  or 
most  NSCLC  tumors  is  EGF  ligand  receptor  loops  offers  the 
clinician  the  following  additional  avenues  for  potential  tar¬ 
geted  therapies:  preventing  sheddase  up-regulation  or  activ¬ 
ity,  preventing  ligand  production  directly  or  by  inhibition  of 
the  loop  at  a  more  upstream  stage,  targeting  the  soluble  form 
of  the  ligand,  and  preventing  ligand-receptor  interaction. 

With  the  identification  of  deregulated  expression  of  EGF 
family  ligands  in  lung  cancer  pathogenesis,  we  can  now 
consider  using  the  relevant  ligands  for  early  cancer  diagno¬ 
sis,  identifying  key  therapeutic  targets,  and  as  biomarkers  to 
monitor  response  to  chemoprevention  or  very  early  treat¬ 
ment.  Because  the  ligands  are  soluble,  they  potentially  could 
be  detected  in  blood  or  bronchial  lavage  specimens  in  addi¬ 
tion  to  biopsy  and  brushing  specimens.  Furthermore,  while 
exploring  their  diagnostic  and  therapeutic  targeting  roles, 
we  need  to  understand  the  molecular  mechanisms  leading 
to  the  deregulated  expression  of  these  ligands.  Copy  num¬ 
ber  changes,  mutations,  promoter  alterations  (including  epi¬ 
genetic  changes),  the  role  of  specific  transcription  factors 


(such  as  the  lineage-specific  oncogene  TITF1),  and  altered 
miRNA  expression  are  all  potential  mechanisms  that  need 
to  be  explored,  as  does  ligand  expression  in  cancer  stem 
cells. 

Another  major  clinical  interest  is  to  understand  the  sequen¬ 
tial  appearance  of  molecular  changes  during  multistage 
pathogenesis.  The  appearance  of  EGFR  mutations  at  a  prein- 
vasive  and  even  at  a  premalignant  phase  creates  opportu¬ 
nities  to  use  EGFR  mutation  markers  for  risk  identification, 
early  detection,  and  prevention,  particularly  for  never  smo¬ 
kers,  who  are  at  most  risk  for  EGFR-mutant  tumors  and 
for  whom  no  such  markers  currently  exist  (31).  Early  EGFR 
mutations  also  have  important  implications  for  the  study  of 
EGFR  tyrosine  kinase  inhibitors  in  the  adjuvant/second  pri¬ 
mary  tumor  prevention  setting.  Whereas  mutations  seem  to 
be  initiating  events,  copy  number  gains  are  related  to  pro¬ 
gression  and  metastatic  events.  Therefore,  heterogeneity 
may  occur  both  within  the  primary  tumor  and  between  the 
primary  tumor  and  metastatic  sites.  These  considerations  are 
important  if  copy  number  gains  are  used  as  a  marker  for  se¬ 
lecting  targeted  therapies,  and  they  indicate  the  importance 
of  testing  for  this  marker  in  tumor  samples  obtained  imme¬ 
diately  before  therapy  versus  relying  on  marker  data  from 
earlier  samples. 

The  reports  of  Zhang  et  al.  and  Tang  et  al.  in  this  issue  of  the 
journal  shed  new  light  on  the  highly  complex,  multifaceted, 
and  as  yet  incompletely  understood  nature  of  the  EGFR  sig¬ 
naling  pathway.  This  pathway  in  NSCLCs  and  in  the  bron¬ 
chial  epithelium  of  patients  at  a  high  lung-cancer  risk  will 
be  a  critical  focus  of  diagnostic,  preventive,  and  therapeutic 
efforts  for  the  foreseeable  future. 
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Tumor  Suppressor  FUS1  Signaling  Pathway 
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Abstract:  FUS1  is  a  novel  tumor  suppressor  gene  identified  in  the 
human  chromosome  3p21.3  region  where  allele  losses  and  ge¬ 
netic  alterations  occur  early  and  frequently  for  many  human 
cancers.  Expression  of  FUS1  protein  is  absent  or  reduced  in  the 
majority  of  lung  cancers  and  premalignant  lung  lesions.  Resto¬ 
ration  of  wt-FUSl  function  in  3p21. 3-deficient  non-small  cell 
lung  carcinoma  cells  significantly  inhibits  tumor  cell  growth  by 
induction  of  apoptosis  and  alteration  of  cell  cycle  kinetics.  Here 
we  present  recent  findings  indicating  that  FUS1  induces  apopto¬ 
sis  through  the  activation  of  the  intrinsic  mitochondrial-depen¬ 
dent  and  Apaf-1 -associated  pathways  and  inhibits  the  function  of 
protein  tyrosine  kinases  including  EGFR,  PDGFR,  AKT,  c-Abl, 
and  c-Kit.  Intravenous  administration  of  a  nanoparticle  encapsu¬ 
lated  FUS1  expression  plasmid  effectively  delivers  FUS1  to 
distant  tumor  sites  and  mediates  an  antitumor  effect  in  orthotopic 
human  lung  cancer  xenograft  models.  This  approach  is  the 
rationale  for  an  ongoing  FUS1  -nanoparticle-mediated  gene  de¬ 
livery  clinical  trial  for  the  treatment  of  lung  cancer. 

Key  Words:  Tumor  suppressor  gene,  FUS1,  Signaling  pathway, 
Lung  cancer. 

(/  Thorac  Oncol.  2008 ;3:  327-330) 

Cytogenetic  and  allelotyping  studies  of  fresh  tumors  and 
tumor  cell  lines  have  shown  that  allele  losses  and  genetic 
alterations  on  the  short  arm  of  chromosome  3p  (3p25,  3p21- 
22,  3pl4,  and  3pl2-13)  are  among  the  most  frequent  and 
earliest  genomic  abnormalities  involved  in  a  wide  spectrum 
of  human  cancers,  including  lung1-6  and  breast.7-9  Multiple 
overlapping  homozygous  deletions  have  also  been  found  in 
the  3p21.3  region,  spanning  a  120  kb  genomic  locus  in  human 
lung  and  breast  cancer  cell  lines.10’11  Chromosomal  abnor¬ 
malities  in  the  3p21.3  region  have  been  frequently  detected  in 
smoke-damaged  respiratory  epithelium  and  preneoplastic  le¬ 
sions.10’12-13  These  findings  suggest  that  one  or  more  putative 
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3p21.3  tumor  suppressor  genes  function  as  “gatekeepers”  in 
the  molecular  pathogenesis  of  lung  and  other  human  can¬ 
cers.10’12’14  The  novel  FUS1  gene  is  one  of  the  nine  candidate 
TSGs  ( CACNA2D2 ,  PL6 ,  101F6 ,  FUS1 ,  BLU ,  RASSF1 , 
NPRL2 ,  HYAL2 ,  and  HYAL1)  that  were  identified  in  this 
region.1’14-17  In  this  review,  we  will  describe  a  pathway 
involved  in  FUSl-mediated  tumor  suppression  and  discuss 
potential  translational  applications  of  the  FUS1  TSG  for 
human  lung  cancer  therapy. 

Inactivation  of  FU51  In  Lung  Cancer 
Pathogenesis 

The  FUS1  gene  may  be  inactivated  in  human  cancer 
cell  lines  and  primary  tumors  by  haploinsufficiency.1617  Al¬ 
though  single  allele  loss  is  common,  only  a  few  missense 
mutations  and  C-terminal  deletion  mutations  have  been  iden¬ 
tified  in  primary  lung  cancer  samples,  and  there  is  no  evi¬ 
dence  for  promoter  hypermethylation.616’17  FUS1  mRNA 
transcripts  could  be  detected  on  Northern  blots  of  RNAs 
prepared  from  some  lung  cancer  cell  lines,  but  no  endogenous 
FUS1  protein  could  be  detected  in  a  majority  of  non- small 
cell  lung  carcinoma  (NSCLC)  cells  and  almost  all  of  the 
small-cell  lung  cancer  (SCLC)  cell  lines  tested.61617  Myris- 
toylation  of  the  FUS1  N-terminus  is  required  for  tumor 
suppressor  activity.17  A  loss  of  expression  coupled  with  a 
myristoylation  defect  of  the  FUS1  protein  was  detected  in 
primary  lung  cancers.  The  myristoylation  defective  FUS1 
protein  has  a  greatly  reduced  half-life  and  is  subject  to  rapid 
proteosomal  degradation.17  Using  a  tissue  microarray  of  303 
lung  cancers,  loss  or  reduction  of  FUS1  expression  was 
detected  in  100%  of  SCLCs  and  82%  of  NSCLCs.18  In 
NSCLCs,  loss  or  reduction  of  FUS1  expression  was  associ¬ 
ated  with  significantly  worse  overall  patient  survival. 
Squamous  metaplasia  and  dysplasia  expressed  signifi¬ 
cantly  lower  levels  of  FUS1  than  did  normal  and  hyper¬ 
plastic  bronchial  epithelia.  Lee  et  al.19  showed  the  trans¬ 
lation  of  FUS1  was  significantly  down-regulated  by 
microRNA-378  targeting  the  3'UTR  of  FUS1  mRNA  and 
the  ectopic  expression  of  miR-378  enhanced  cell  survival, 
tumor  growth,  and  angiogenesis.  A  genetically  engineered 
mouse  with  a  targeted  disruption  of  the  FUS1  gene  devel¬ 
oped  signs  of  autoimmune  disease,  showed  an  increased 
frequency  of  spontaneous  vascular  tumor  formation,  and 
had  defects  in  natural  killer  cell  maturation  coupled  with 
IL-15  insufficiency.20  These  findings  suggest  that  loss  of 
FUS1  expression  may  play  an  important  role  in  the  early 
pathogenesis  of  lung  cancer. 
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The  Role  of  FUS1  in  the  Intrinsic  Apoptotic 
Signaling  Pathway 

We  previously  used  recombinant  adenoviruses  or  N-[  1- 
(2,3-dioleoyloxy)propyl]-A,A,A-trhnethylammonium  chlo¬ 
ride/cholesterol  nanoparticle-complexed  plasmid  vectors  to 
introduce  FUS1  and  other  genes  into  lung  cancer 
cells.14’15’17’21’22  FUS1  showed  the  most  potent  proapoptotic 
activity  in  human  lung  cancer  cells  among  these  candidate 
3p21.3  TSGs. 15-1723  To  identify  the  pathway  involved  in 
FUS1 -mediated  apoptosis,  we  used  a  ProteinChip  array- 
based  SELDI-MS  spectrometry  to  analyze  all  of  the  protein 
species  in  complexes  immunoprecipitated  by  anti-FUSl -an¬ 
tibodies.  The  apoptotic  protease-activating  factor  1  (Apaf-1) 
was  identified  as  a  potential  cellular  target  of  FUS1  protein  by 
its  direct  protein-protein  interaction  (Figure  1).  A  computer- 
based  analysis  of  the  functional  domains  and  signaling  motifs 
within  the  amino  acid  sequence  of  FUS1  and  Apaf-1  proteins 
reveals  Class  II  and  Class  I  PDZ24’25  protein-protein  interac¬ 
tion  motifs  at  the  C-termini  of  FUS1  and  Apaf-1  proteins, 
respectively,  providing  a  structural  bases  for  FUS1 -Apaf-1 
protein-protein  interaction.  Apaf-1  plays  an  important  role  in 
the  mitochondria-dependent  apoptotic  pathway.26-28  A  rela¬ 
tively  high  level  of  endogenous  Apaf-1  protein  was  univer¬ 
sally  detected  in  lung  cancer  cells.  These  Apaf-1  proteins 


appeared  to  be  functionally  inactive,  as  indicated  by  their  lack 
of  intrinsic  ATPase  activity,  which  is  essential  for  Apaf-1  - 
mediated  caspase  activation  and  apoptosis  induction  in  both 
cancer  cells  deficient  in  FUS1  expression  and  in  normal  cells 
with  low  level  of  endogenous  FUS1  expression.2829  We 
showed  that  activation  of  endogenous  FUS1  in  normal  cells 
in  response  to  stress,  such  as  UV  irradiation,  and  the  forced 
expression  of  FUS1  in  FUS  1-deficient  tumor  cells  can  trigger 
cytochrome  C  release  from  mitochondria  into  the  cytosol  and 
cause  FUS1  binding  to  Apaf-1  and  recruit  it  to  critical  cellular 
locations,  thus,  activating  Apaf-1  in  situ,  initiating  Apaf-1  - 
mediated  caspase  activation,  and  inducing  apoptosis.17’30’31 
Although  our  proposed  mechanism  remains  to  be  validated 
by  identifying  all  of  the  components  in  this  complicated 
apoptotic  apparatus  and  their  dynamic  interactions,  our  find¬ 
ings  support  a  role  for  loss  of  FUS1  expression  as  a  critical 
event  in  lung  cancer  pathogenesis. 

Inhibition  of  Tyrosine  Kinase  Signaling 
by  FUS1 

We  found  that  reactivating  FUS1  in  3p2 1.3 -deficient 
lung  cancer  cells  inhibited  their  growth  and  induced  apopto¬ 
sis,  in  part,  by  inhibiting  protein  tryrosine  kinases  (PTKs) 
such  as  EGFR,  PDGFR,  c-Abl,  c-Kit,  and  AKT  (Figure  1).  A 
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FIGURE  1.  Schema  of  the  FUS1  pathway.  Activation  of  FUS1  in  normal  cells  in  response  to  apoptotic  stimuli,  stress,  or  resto¬ 
ration  of  wt-FUSI  function  by  ectopic  gene  transfer  in  FUS1  -deficient  tumor  cells  activates  the  intrinsic  mitochondrial  apopto¬ 
sis  pathway.  Activation  of  FUS1  triggers  cytochrome  c  (Cyt  C)  release  from  the  inner  membrane  of  mitochondria  to  the  cy¬ 
tosol,  selectively  and  directly  interacts  with  Apaf-1  and  recruits  it  to  a  critical  subcellular  location,  and  activates  Apaf-1  by 
induction  of  its  ATPase  activity  in  situ  thus  facilitating  downstream  Apaf-1 -mediated  apoptosome  assembly,  caspase  activation, 
and  apoptosis  induction.  Activation  of  FUS1  may  also  block  MDM2-associated  proteolytic  degradation  of  p53  and  enhance 
the  p53-dependent  apoptotic  pathway.  The  potent  tumor  suppressor  activity  of  FUS1  is  also  in  part  mediated  by  its  inhibition 
of  multiple  oncogenic  protein  tyrosine  kinases  (PTKs)  including  EGFR,  PDGFR,  c-Abl,  c-Kit,  and  AKT  that  are  up-regulated  in 
cancer  cells.  FUS1 -mediated  inactivation  of  these  oncogenic  PTKs  leads  to  induction  of  apoptosis  and  inhibition  of  tumor  cell 
proliferation  and  survival. 
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computer-based  homology  modeling  of  the  FUSI  protein 
sequence  and  structure32  predicts  a  potential  protein  kinase  A 
activation  site,  and  an  A  kinase  anchoring  protein  homology 
motif.17  It  has  been  shown  that  a  FUSI  peptide  derived  from 
FUSI  protein  sequence  in  a  region  that  was  deleted  in  a 
mutant  FUSI  gene  detected  in  some  lung  cancer  cell  lines 
inhibits  a  constitutively  active  recombinant  c-Abl  tyrosine 
kinase  and  the  full  length  c-Abl  kinase  in  vitro.33  Platelet- 
derived  growth  factors  (PDGFs)  play  a  crucial  role  in  cell 
migration,  proliferation,  apoptosis,  and  cell  survival.  Forced 
expression  of  wt-FUSl  by  nanoparticle-mediated  gene  trans¬ 
fer  in  the  PDGFR/3-expressing  SCLC  H69  and  H417  cell 
lines  inactivated  PDGFR/3  and  its  downstream  targets,  PI3K 
and  AKT  kinases,  as  shown  by  marked  reduction  in  PTK 
phosphorylation. 

We  explored  the  ability  of  FUSI  expression  to  over¬ 
come  gefitinib  resistance  in  NSCLC  cells.  We  found  that 
reexpression  of  wt-FUSl  by  FUSI -nanoparticle-mediated 
gene  transfer  into  FUSI -deficient  and  gefitinib-resistant 
NSCLC  cell  lines  that  have  wt-EGFR  sensitized  them  to 
gefitinib  treatment  and  synergistically  induced  apoptosis. 
FUSI  nanoparticle  treatment  alone  or  with  gefitinib  in  ge¬ 
fitinib-resistant  NSCLC  cells  markedly  inactivated  EGFR  and 
AKT,  as  shown  by  decreased  phosphorylation  levels  of  these 
proteins,  and  activated  caspase-3,  caspase-9,  and  PARP,  as 
shown  by  the  increased  cleavage  of  their  precursor  proteins 
on  Western  blots.  Together,  these  results  suggest  that  com¬ 
bination  treatment  with  FUSI  and  PTK  inhibitors  may  be  a 
useful  therapeutic  strategy  for  human  lung  cancer. 

Translational  Applications  of  FUSI  for  Lung 
Cancer  Therapy 

We  initiated  a  dose  escalation  Phase  I  clinical  trial  of 
FUSI -nanoparticles  in  patients  with  chemotherapy  refractory 
stage  IV  lung  cancer.  In  this  clinical  trial,  a  FUSI  expression 
plasmid  in  a  nanoparticle  is  injected  intravenously  in  stage  IV 
lung  cancer  patients  who  had  progressed  after  cisplatin  com¬ 
bination  chemotherapy.  The  trial  continues  to  accrue  patients. 

We  have  also  explored  the  combined  effects  of  the 
FUSI -nanoparticles  with  conventional  chemotherapy  and  ex¬ 
ternal  beam  radiotherapy.31  Forced  expression  by  FUS1- 
nanoparticle-mediated  gene  transfer  sensitized  NSCLC  cells 
to  cisplatin  or  y-radiation,  resulting  in  a  3-  to  8-fold  increase 
in  inhibition  of  tumor  cell  viability  and  induction  of  apoptosis 
in  FUSI -transfected  cells.  Systemic  treatment  with  a  combi¬ 
nation  of  FUSI  nanoparticles  and  cisplatin  in  a  human  lung 
cancer  orthotopic  mouse  model  synergistically  enhanced  the 
therapeutic  efficacy  of  cisplatin. 

We  evaluated  the  combined  effects  of  FUSI  and  the 
TSG  p53  on  tumor  cell  growth  and  apoptosis  induction  in 
NSCLC  cells  cotransfected  with  FUSI-  and  p53.30  We  found 
that  coexpression  of  wt-p53  with  wt-FUSl,  but  not  the 
myristoylation  mutant  (mt-FUSl),  synergistically  inhibited 
cell  proliferation  and  induced  apoptosis  in  human  NSCLC 
cells.  We  also  found  that  coexpression  of  FUSI  and  p53 
enhanced  the  sensitivity  of  NSCLC  cells  to  treatments  with 
the  DNA-damaging  agents  y-radiation  and  cisplatin.  We 
found  that  the  observed  synergistic  tumor  suppression  by 
FUSI  and  p53  correlated  with  FUSI -mediated  down-regula¬ 


tion  of  MDM2  expression  resulting  in  the  accumulation  and 
stabilization  of  p53  protein  and  the  up-regulation  of  Apaf-1 
expression  with  activation  of  the  caspase  cascade  (Figure  1). 
Our  results  demonstrate  an  important  role  for  FUSI  in  mod¬ 
ulating  chemo-  and  radiosensitivities  of  lung  cancer  cells  and 
suggest  that  an  optimal  combination  of  molecular  therapeu¬ 
tics,  such  as  the  proapoptotic  tumor  suppressor  FUS7 -nano¬ 
particle  and  conventional  anticancer  agents,  such  as  cisplatin, 
may  be  an  effective  treatment  strategy  for  human  lung  cancer. 
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Abstract 

Nanoparticle  quantum  dots  (QDs)  provide  sharper  and  more  photostable  fluorescent  signals 
than  organic  dyes,  allowing  quantification  of  multiple  biomarkers  simultaneously.  In  this  study, 
we  quantified  the  expression  of  epidermal  growth  factor  receptor  (EGFR)  and  E-cadherin 
(E-cad)  in  the  same  cells  simultaneously  by  using  secondary  antibody-conjugated  QDs  with  two 
different  emission  wavelengths  (QD605  and  QD565)  and  compared  the  cellular  distribution  of 
EGFR  and  E-cad  between  EGFR-tyrosine  kinase  inhibitor  (TKI)-insensitive  and  -sensitive  lung 
and  head  and  neck  cancer  cell  lines.  Relocalization  of  EGFR  and  E-cad  upon  treatment  with  the 
EGFR-TKI  erlotinib  in  the  presence  of  EGF  was  visualized  and  analyzed  quantitatively.  Our 
results  showed  that  QD-immunocytochemistry  (ICC) -based  technology  can  not  only  quantify 
basal  levels  of  multiple  biomarkers  but  also  track  the  localization  of  the  biomarkers  upon 
biostimulation.  With  this  new  technology  we  found  that  in  EGFR-TKI-insensitive  cells,  EGFR 
and  E-cad  were  located  mainly  in  the  cytoplasm;  while  in  sensitive  cells,  they  were  found 
mainly  on  the  cell  membrane.  After  induction  with  EGF,  both  EGFR  and  E-cad  internalized  to 
the  cytoplasm,  but  the  internalization  capability  in  sensitive  cells  was  greater  than  that  in 
insensitive  cells.  Quantification  also  showed  that  inhibition  of  EGF-induced  EGFR  and  E-cad 
internalization  by  erlotinib  in  the  sensitive  cells  was  stronger  than  that  in  the  insensitive  cells. 
These  studies  demonstrate  substantial  differences  between  EGFR-TKI-insensitive  and  -sensitive 
cancer  cells  in  EGFR  and  E-cad  expression  and  localization  both  at  the  basal  level  and  in 
response  to  EGF  and  erlotinib.  QD-based  analysis  facilitates  the  understanding  of  the  features 
of  EGFR-TKI-insensitive  versus  -sensitive  cancer  cells  and  may  be  used  in  the  prediction  of 
patient  response  to  EGFR-targeted  therapy. 


1.  Introduction 

3  Address  for  correspondence:  Department  of  Hematology  and  Medical 

Oncology,  Winship  Cancer  Institute,  Emory  University  School  of  Medicine,  recent  years,  the  science  of  nanotechnology  has  been 
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to  provide  advantages  in  molecular  detection,  imaging, 
diagnostics,  and  therapeutics  in  the  cancer  field  [1,  2]. 
Quantum  dots  (QDs)  are  nanoscale  particles  made  from 
inorganic  semiconductors  and  have  large  molar  extinction 
coefficients  which  are  10-50  times  larger  than  those  of 
organic  dyes.  QDs  have  superior  signal  brightness, 
photostability,  longer  excited-state  lifetimes,  and  optimized 
signal-to-background  ratios  compared  with  organic  dyes  [3]. 
QDs  can  be  covalently  linked  to  biological  molecules  such 
as  peptides,  proteins,  and  nucleic  acids  [4,  5].  Thus,  they 
are  ideal  imaging  materials  for  molecular  profiling  [3-6].  A 
significant  advantage  of  QDs  over  immunofluorescence  using 
organic  dyes  and  fluorescence-activated  cell  sorting  (FACS)  is 
that  QDs  can  both  visualize  and  quantify  multiple  biomarkers 
simultaneously  in  the  same  material  because  they  have  a  long 
excitation  and  narrow  emission  spectra  and  can  be  excited 
simultaneously  through  one  appropriate  excitation  source. 
This  allows  the  quantification  and  correlation  of  molecular 
signatures  with  cellular  response  to  targeted  therapies  [6,  7]. 

Epidermal  growth  factor  receptor  (EGFR)  is  a  170  kDa 
transmembrane  protein  with  intrinsic  tyrosine  kinase  activity 
that  regulates  cell  growth  in  response  to  binding  of  its  ligands, 
including  epidermal  growth  factor  (EGF)  and  transforming 
growth  factor-a  (TGF-cy).  Overexpression  of  EGFR  and  its 
ligand  TGF-a  is  reportedly  observed  in  50-90%  of  non¬ 
small  cell  lung  carcinoma  (NSCLC)  [8,  9]  and  80-90%  of 
squamous  cell  carcinoma  of  the  head  and  neck  (SCCHN) 
specimens  [10-13].  Several  studies  have  demonstrated  that 
EGFR  overexpression  correlates  with  reduced  disease-free 
and  overall  survival  [13-18].  Therefore,  many  strategies 
including  using  specific  tyrosine  kinase  inhibitors  (TKI)  and 
monoclonal  antibodies  to  target  EGFR  have  been  developed  for 
the  treatment  of  NSCLC  and  SCCHN.  However,  resistance  to 
EGFR-TKI  treatment  has  been  observed  in  lung,  SCCHN,  and 
other  types  of  cancer  [19].  The  relationship  between  EGFR 
expression  and  a  patient’s  response  to  EGFR-targeted  therapy 
is  currently  not  clear  [20-23]. 

Recent  publications  suggest  that  an  epithelial-to-mesench- 
ymal  transition  (EMT)  is  a  determinant  of  the  sensitivity  of 
cancer  cells  to  EGFR  inhibition  [24-26].  E-cadherin  (E- 
cad)  expression  in  NSCLC  and  SCCHN  tissue  specimens 
has  been  reported  in  several  studies  and  is  correlated 
with  tumor  progression  and  metastasis  [27-33].  Restoring 
E-cad  expression  enhanced  sensitivity  to  EGFR-targeted 
therapy  [34],  suggesting  that  E-cad  expression  may  be  required 
for  successful  targeting  of  EGFR.  Although  the  hypothesis  that 
E-cad  and  EGFR  may  interact  was  proposed  more  than  ten 
years  ago  [35],  the  effect  of  the  molecular  relationship  between 
EGFR  and  E-cad  on  EGFR-targeted  therapy  is  currently 
unclear.  Recently,  we  examined  the  expression  and  localization 
of  E-cad  and  EGFR  in  both  SCCHN  tissue  specimens  and 
cell  line  models  and  found  that  not  only  expression  but  also 
localization  of  EGFR  and  E-cad  had  clinical  relevance  in 
predicting  lymph  node  metastasis  and  patient  survival  [36]. 
Therefore,  quantification  of  EGFR  and  E-cad  localization  at 
a  basal  level  and  in  response  to  EGFR  ligands  and  EGFR- 
TKIs  should  facilitate  our  understanding  of  the  mechanism  of 
resistance  to  EGFR-targeted  therapy. 


This  current  study  reports  the  use  of  QD-based  immuno- 
cytochemistry  (QD-ICC)  per-cell  quantification  analyses  to 
study  the  expression  and  subcellular  localization  of  EGFR 
and  E-cad.  QD-based  quantification  allows  comparison  of  the 
expression  of  these  two  proteins  between  EGF-TKI- sensitive 
and  -insensitive  NSCLC  and  SCCHN  cancer  cell  lines,  thereby 
elucidating  one  mechanism  for  cellular  resistance  to  EGFR- 
targeted  therapy  and  providing  a  basis  for  the  prediction  of 
response  to  EGFR-targeted  therapy. 

2.  Materials  and  methods 

2.1.  Cell  lines 

NSCLC  cell  lines  HI 703,  H460,  H292  and  H322  were 
kindly  provided  by  Dr  Shi-Yong  Sun  (Emory  University 
Winship  Cancer  Institute,  Atlanta  GA);  H460  and  H1703  are 
EGFR-TKI-insensitive  and  H292  and  H322  are  -sensitive  cell 
lines  [25].  The  SCCHN  cell  line  686LN  was  established 
from  a  lymph  node  metastasis  of  a  primary  base  of  tongue 
SCC.  686LN-M4e  is  a  highly  metastatic  cell  line  generated 
by  in  vivo  selection  from  686LN  that  has  low  metastatic 
potential  in  the  lymph  node  of  the  nude  mouse  as  described 
previously  [37].  The  SCCHN  cell  line  686LN-R30  is  an 
EGFR-TKI-insensitive  cell  line  established  from  686LN  by 
single  cell  cloning  after  challenging  with  gradually  increased 
concentrations  of  gefitinib.  Additional  SCCHN  cell  lines 
UPCI-37A  and  -37B  were  established  from  larynx  (epiglottis) 
at  the  University  of  Pittsburgh  Cancer  Institute  (Pittsburgh, 
PA);  UPCI-37A  was  from  a  primary  tumor,  while  UPCI-37B 
was  from  lymph  node  metastases.  These  cell  lines  were 
maintained  as  monolayer  cultures  in  RPMI  1640  medium 
(NSCLC  cells)  or  DMEM/F12  50/50  medium  (SCCHN  cells) 
supplemented  with  10%  heat-inactivated  fetal  bovine  serum 
(FBS).  All  cells  were  maintained  in  a  humidified  incubator  at 
37°,  5%  C02. 

2.2.  QD-based  immuno cytochemistry  ( QD-ICC ) 

The  cells  were  seeded  onto  an  8-well  chamber  slide  (Lab- 
Tek  Permanox™  slide,  Rochester,  NY)  and  starved  for  24  h 
(in  FBS -free  medium).  The  cells  were  then  incubated  with 
or  without  erlotinib  (0-2.5  /zM)  for  2  h  and  stimulated  with 
100  ng  ml-1  EGF  (Invitrogen,  Carlsbad,  CA)  for  30  min 
at  37  °C.  The  cells  were  fixed  with  4%  paraformaldehyde, 
permeabilized  with  0.25%  Triton-X-  100/PBS  for  10  min, 
blocked  with  10%  goat  serum,  and  incubated  with  primary 
antibodies,  rabbit  anti-EGFR  (clone  1005,  1:200  dilution, 
Santa  Cruz  Biotechnology,  Santa  Cruz,  CA)  and  mouse  anti-E- 
cad  (clone  36,  1:400  dilution,  BD  Biosciences,  Franklin  Lakes, 
NJ)  simultaneously.  After  washing  with  phosphate-buffered 
saline  (PBS),  the  cells  were  incubated  with  QD-secondary 
antibody  conjugates  (QD  605  goat  F(ab02  anti-rabbit  IgG;  QD 
565  goat  F(ab02  anti-mouse  IgG,  1:100  dilution,  Invitrogen, 
Carlsbad,  CA)  in  a  cocktail  solution  at  37  °C  (figure  1). 
These  QDs  are  made  of  semiconductor  materials,  including 
cadmium  mixed  with  selenium  or  tellurium  which  has  been 
coated  with  an  additional  semiconductor  shell  (zinc  sulfide)  to 
improve  the  optical  properties  of  the  material.  Cell  nuclei  were 
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Figure  1.  (A)  Schematic  diagram  of  the  overall  structure  of  a  QD-secondary  antibody  conjugate.  The  layers  represent  the  distinct  structural 
elements  of  the  QD  nanocrystal  conjugates,  and  are  roughly  to  scale  (adapted  from  Invitrogen).  (B)  TEM  image  of  core-shell  QD 
nanoparticles  at  200  000 x  magnification  (adapted  from  Invitrogen).  (C)  Cartoon  showing  cocktail  QD-based  immunocytochemistry 
(QD-ICC)  with  QD-secondary  antibody  conjugates.  Several  proteins  presented  as  A,  B  and  C  can  be  detected  simultaneously  by  specific 
primary  antibodies  plus  appropriate  secondary  antibodies  conjugated  with  QDs. 
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Figure  2.  Immunoblotting  of  cancer  cells  treated  with  erlotinib.  Lung  cancer  cell  lines  H292  and  HI 703,  head  and  neck  cancer  cell  lines 
686LN,  686LN-M4e,  686LN-R30,  UPCI-37A  and  -37B  were  treated  with  erlotinib  at  concentrations  of  0.5  and  10  /xM  for  72  h.  G3PDH 
served  as  a  loading  control. 


counterstained  using  4/,6-diamidino-2-phenylindole  (DAPI, 
Invitrogen,  Carlsbad,  CA).  Mouse  and  rabbit  IgG  were  used 
as  negative  controls. 

For  tracking  the  endosome  and  lysosome  distribution  of 
EGFR  in  both  EGF-TKI- sensitive  and  -insensitive  cell  lines, 
the  cells  were  stimulated  with  100  ng  ml-1  EGF  for  30, 
60  or  120  min  at  37  °C.  The  mixed  primary  antibodies 
were  rabbit  anti-EGFR  (clone  1005,  1:200  dilution,  Santa 
Cruz  Biotechnology,  Santa  Cruz,  CA)  plus  mouse  anti-EEAl, 
an  early  endosome  marker  (clone  10,  1:800  dilution,  BD 
Biosciences,  Franklin  Lakes,  NJ),  or  mouse  anti-CD63,  a  late 
endosome/lysosome  marker  (clone  H5C6,  1:800  dilution,  BD 
Biosciences,  Franklin  Lakes,  NJ).  The  staining  procedures 
were  the  same  as  the  above. 

2.3.  QD  spectral  imaging  and  signal  quantification 

An  Olympus  microscope  1X71  with  a  CRi  Nuance  spectral 
imaging  and  quantifying  system  (CRi  Inc.,  Woburn,  MA),  was 
used  to  observe  and  quantify  the  QD  signal.  All  cubed  image 
files  were  collected  from  the  cell  slides  at  10  nm  wavelength 
intervals  from  500  to  800  nm,  with  an  auto  exposure  time  per 
wavelength  interval  at  400  x  magnification.  Taking  the  cube 
with  a  long  wavelength  bandpass  filter  allowed  transmission 
of  all  emission  wavelengths  above  450  nm.  Both  separated 
and  combined  QD  images  were  established  after  determining 
the  QD  spectral  library  and  unmixing  the  cube.  We  removed 
background  for  accurate  quantification  of  the  QD  signals.  For 


quantification  of  the  QD  signal  on  cellular  membranes  and  in 
the  cytoplasm  with  Nuance  software,  we  obtained  both  total 
and  manually  marked  membrane  QD  signals  which  showed  the 
correct  QD  wavelength  in  50  cells  from  10  randomly  selected 
fields  on  the  cell  slides.  The  signal  unit  (au)  was  defined  as  the 
average  fluorescence  signal  intensity  per  exposure  time  (ms) 
which  was  obtained  from  the  Nuance  software.  The  relative 
internalization  of  EGF-induced  EGFR  or  E-cad  was  defined 
as  [1 -(membrane  signal  with  EGF/membrane  signal  without 
EGF)]  x  100%. 

2.4.  Immunoblotting  analysis 

Immunoblotting  was  performed  as  described  in  our  previous 
studies  [37,  38].  Primary  antibodies  for  immunoblotting  were 
monoclonal  antibodies  against  E-cad  (clone  G-10,  1:1000 
dilution),  polyclonal  antibodies  against  phospho-EGFR  (Tyr 
1045,  1:500  dilution),  and  EGFR  (clone  1005,  1:500  dilution) 
(Santa  Cruz  Biotechnology,  Santa  Cruz,  CA).  An  antibody 
against  G3PDH  (1:3000  dilution,  Trevigen,  Inc.,  Gaithersburg, 
MD)  was  used  as  an  internal  control. 

2.5.  Fluorescence-activated  cell  sorting  (FACS) 

After  starving  the  cells  for  24  h,  erlotinib  (0. 1-2.5  /xM)  was 
added  to  the  cells  2  h  before  stimulation  with  EGF-Alexa 
Fluor-488  (100  ng  ml-1,  Invitrogen,  Carlsbad,  CA)  for  30  min. 
Then  the  cells  were  washed  with  acetic  acid  (0.2  M,  plus 
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0.5  M  NaCl,  pH  2.8)  to  remove  the  uninternalized  membrane 
receptor,  and  suspended  in  2%  bovine  serum  albumin  (BSA) 
with  0.05%  NasN  in  PBS.  FACS  was  used  to  examine  EGFR 
internalization.  Cells  incubated  with  EGF-488  at  4°C  were 
used  as  the  negative  control.  Relative  internalization  was 
defined  as  (FITC  +  cells/total  cells)  x  100%. 

2.6.  Statistical  analysis 

All  results  represent  the  average  of  at  least  three  separate 
experiments  and  are  expressed  as  mean  =b  SD  unless  otherwise 
indicated.  Statistical  analysis  was  performed  using  a  Gtest. 
P  <  0.05  was  considered  as  statistically  significant. 

3.  Results  and  discussion 

3.1.  Basal  level  and  localization  of  EGFR  and  E-cad 

To  understand  EGFR-TKI  resistance,  we  tested  9  lung  and 
head  and  neck  cancer  cell  lines:  HI 703,  H460,  H292,  H322; 
686LN,  686LN-R30,  686LN-M4e;  UPCI-37A  and  -37B. 
Among  them  H292,  H322,  686LN,  UPCI-37A  are  sensitive 
to  EGFR-TKI  and  HI 703,  H460,  686LN-R30,  686LN-M4e, 
and  UPCI-37B  are  insensitive  cell  lines.  Alterations  in  the 
levels  of  E-cad,  p-EGFR,  and  total  EGFR  in  the  presence  or 
absence  of  erlotinib  were  studied  by  immunoblotting  (figure  2). 
The  data  showed  that  although  p-EGFR  levels  were  reduced 
by  0.5  /jlM  erlotinib  in  the  EGFR-TKI-insensitive  cell  lines 
H1703,  686LN-M4e,  686LN-R30  and  37B  cells,  almost  no 
growth  inhibition  was  observed  in  these  cells,  by  treatment 
with  erlotinib  at  this  concentration  (data  not  shown).  Thus, 
reduction  of  activated  EGFR  did  not  correlate  with  growth 
inhibition  by  erlotinib  in  the  EGFR-TKI-insensitive  cell  lines. 
Furthermore,  the  insensitive  cell  lines  had  lower  total  levels  of 
EGFR  and  E-cad  than  the  sensitive  cell  lines. 

Since  immunoblotting  can  only  show  the  total  level  of 
each  protein,  QD-ICC  combined  with  its  interrelated  imaging 
and  quantification  system  was  used  to  obtain  the  quantified 
colocalization  of  the  related  proteins  in  the  same  sample. 
Figure  3  shows  membrane  and  cytoplasmic  distribution  of  E- 
cad  and  EGFR  in  EGFR-TKI- sensitive  and  -insensitive  cells. 
Results  of  the  quantification  show  that  the  mean  of  the  average 
E-cad  membrane  signal  in  four  sensitive  cell  lines  was  0.5 12  zb 
0.1 10  au,  while  that  in  five  insensitive  cells  was  only  0.307  =b 
0.055  au  (P  <  0.008;  table  1).  The  mean  of  the  average  EGFR 
membrane  signal  in  the  sensitive  cell  lines  was  1 .413±0.448  au 
compared  with  0.443  ±  0.076  au  in  insensitive  cells  (P  < 
0.002;  table  1).  QD-based  quantification  also  confirmed  that 
not  only  membrane  but  also  total  protein  levels  of  both  EGFR 
and  E-cad  were  lower  in  the  insensitive  cell  lines  than  in  the 
sensitive  cell  lines  (data  not  shown). 

These  observations  are  consistent  with  our  recent  studies 
on  human  tissues  and  animal  and  cell  line  models  of  head  and 
neck  cancer,  which  identified  three  populations  of  tumor  cells, 
including  those  with  high  membrane  expression  of  EGFR  and 
E-cad  and  those  with  low  and  mostly  cytoplasmic  expression 
of  EGFR  and  E-cad  [36].  To  further  understand  these  cells,  we 
asked  two  fundamental  questions,  (i)  how  these  cells  respond 


EGFR-TKI  insensitive  cells 


EGFR-TKI  sensitive  cehs 


Figure  3.  (A)  Colocalization  of  EGFR  (QD605,  red)  and  E-cad 
(QD565,  green)  in  the  absence  or  presence  of  EGF.  EGFR  and  E-cad 
expression  levels  were  imaged  by  an  Olympus  microscope  1X7 1  with 
a  CRi  spectral  imaging  and  quantifying  system.  Yellow  indicates 
colocalization  of  EGFR  (red)  and  E-cad  (green).  (B)  Spectral  image 
of  positive  staining  (including  QDs  565,  QDs  605,  and  DAPI  signals) 
and  negative  staining  (only  background  and  DAPI  signals). 


to  EGFR  ligands  such  as  EGF,  and  (ii)  what  happens  when 
EGFR-TKI  is  applied  to  the  EGFR- activated  cells? 

The  cocktail  solution  for  QD- secondary  antibody  con¬ 
jugates  was  prepared  in  PBS,  as  recommended  by  the  QD 
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Figure  4.  (A)  Comparison  of  internalization  of  E-cad  and  EGFR  induced  by  EGF  in  EGFR-TKI-insensitive  and  -sensitive  cell  lines  measured 
by  a  CRi  Nuance  system.  Relative  internalization  was  defined  as  [l-(membrane  EGFR  with  EGF/membrane  EGFR  without  EGF)]  x  100%. 
(B)  Comparison  of  EGF-induced  EGFR  internalization  measured  by  FACS.  Relative  internalization  was  defined  as  (FITC  +  cells/total  cells) 
x  100%. 


manufacturer,  Invitrogen  Cooperation,  and  is  suggested  not  to 
affect  the  stability  of  their  QDs.  Our  study  confirmed  that  the 
QD  signals  in  PBS  appeared  in  the  correct  wavelength  with 
reasonable  sensitivity  (see  figure  3).  Although  there  may  be 
some  signal  interference  between  the  two  QDs,  our  objective  is 
to  compare  the  signals  of  QD565  (EGFR)  and  QD605  (E-cad) 
between  different  cell  lines  or  between  different  treatments, 
not  between  QD565  and  QD605;  thus,  any  alteration  in  signal 
level  due  to  the  cocktail  solution  is  unlikely  to  affect  our 
experimental  results. 

3.2.  Response  to  EGF-induced  EGFR  internalization 

In  order  to  characterize  the  differences  between  these  two  types 
of  cell  lines,  relocalization  of  EGFR  and  E-cad  was  quantified 
by  different  QD  signals  simultaneously  after  induction  with 
EGF.  We  found  that  both  EGFR  and  E-cad  internalized  to 
the  cytoplasm  in  EGFR-TKI- sensitive  cell  lines  upon  addition 
of  EGF.  In  contrast,  in  -insensitive  cells,  these  dynamic 
changes  were  not  clearly  observed  (figure  3).  Quantification 
showed  that  the  capability  of  EGF  to  induce  EGFR  and  E- 
cad  internalization  was  much  greater  in  sensitive  cells  than  in 
insensitive  cells.  In  detail,  the  percentage  of  relative  EGFR 
internalization  in  the  sensitive  cells  was  1.39-2.21 -fold  greater 
and  the  E-cad  internalization  was  1.65-2.00-fold  greater  than 
that  in  the  insensitive  cells  (figure  4(A)).  EGFR  internalization 


induced  by  EGF  was  confirmed  with  conventional  FACS 
(figure  4(B)),  which  showed  that  in  sensitive  cells,  the 
percentage  of  relative  internalization  of  EGFR  was  72.97- 
97.99%,  compared  with  only  42.82-58.59%  in  insensitive 
cells.  These  results  are  similar  to  those  from  the  QD-ICC 
analysis. 

3.3.  Erlotinib  inhibition  of  EGF -induced  EGFR 
internalization 

QD  quantification  showed  that  erlotinib  at  0.5  pM  inhibited 
EGF-induced  EGFR  internalization  by  30.90-63.59%  in  the 
sensitive  cells  as  compared  with  the  untreated  control,  whereas 
in  the  insensitive  cells  the  inhibitory  effect  was  only  5.28- 
11.47%  (figures  5  and  6(A)).  Erlotinib  at  0.5  p M  also 
had  a  significant  inhibitory  effect  on  E-cad  internalization 
in  sensitive  cells.  Quantified  QD-ICC  showed  that  in 
the  sensitive  cells,  the  inhibition  of  EGF-induced  E-cad 
internalization  was  30.03-46.62%,  compared  with  3.52-9.60% 
in  the  insensitive  cells  (figure  6(A)).  FACS  analysis  also 
confirmed  that  inhibition  of  EGFR  internalization  by  erlotinib 
was  dose-dependent  in  both  EGFR-TKI-insensitive  and 
-sensitive  cell  lines  (figures  5  and  6(B)),  but  the  inhibition  was 
much  stronger  in  sensitive  than  insensitive  cells. 

EGFR-targeted  therapies  have  been  both  tested  in  clinical 
trials  and  used  in  clinical  practice.  Among  them,  erlotinib 
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Table  1.  Quantified  results  of  E-cad  and  EGFR  membrane  signals. 
(Note:  Average  membrane  signals  of  E-cad  (QD565)  and  EGFR 
(QD605)  between  EGFR-TKI-insensitive  (H1703,  H460,  R30,  M4e, 
and  37B)  and  -sensitive  (H292,  H322,  686LN,  and  37 A)  cell  lines 
were  determined  manually  with  CRi  Nuance  software  as  described  in 
the  Materials  and  methods  section,  au  =  fluorescence  average  signal 
intensity  per  exposure  time  (ms)  (Avg.  signal/exp).) 


EGF  =  0 

EGF  =  100 

Avg.  signal/exp.  STD. 

Avg.  signal/exp. 

STD. 

1st  Ab/cell  lines  (au) 

(au) 

(au) 

(au) 

EGFR 

//1703 

0.312 

0.018 

0.121 

0.007 

H460 

0.222 

0.009 

0.105 

0.013 

R30 

0.377 

0.012 

0.268 

0.047 

M4e 

0.320 

0.011 

0.191 

0.030 

31B 

0.305 

0.015 

0.204 

0.021 

H292 

0.574 

0.050 

0.077 

0.024 

H322 

0.440 

0.019 

0.103 

0.026 

686LN 

0.634 

0.080 

0.170 

0.052 

37A 

0.401 

0.050 

0.115 

0.047 

E-cad 

7/1703 

0.565 

0.111 

0.208 

0.083 

H460 

0.465 

0.032 

0.202 

0.024 

R30 

0.385 

0.074 

0.260 

0.058 

M4e 

0.412 

0.145 

0.287 

0.050 

31B 

0.387 

0.012 

0.266 

0.004 

H292 

1.867 

0.116 

0.102 

0.032 

H322 

1.463 

0.070 

0.273 

0.022 

686LN 

1.526 

0.152 

0.385 

0.066 

37A 

0.796 

0.094 

0.323 

0.079 

is  orally  bioavailable  and  has  various  effects  on  tumor  cells 
expressing  EGFR.  It  can  inhibit  phosphorylation  of  EGFR, 
ERK  and  AKT  and  induce  G1  arrest  and  apoptosis.  Phase  III 
clinical  trials  have  demonstrated  its  efficacy  in  inhibiting  tumor 
progression  [39].  However,  the  response  rate  to  erlotinib  or 
other  EGFR- targeted  therapies  is  limited  [19,  40],  around  10- 
20%  [41].  Therefore,  pre-selection  of  those  patients  who  may 
benefit  most  from  EGFR-targeted  therapies  is  necessary.  The 
current  challenge  is  to  define  sensitive  biomarkers  and  develop 
reliable  methods  to  predict  EGFR-targeting  sensitivity. 

Expression  of  the  EMT  biomarker  E-cad  correlates  with 
tumor  progression  and  metastasis  [27-33]  and  has  been 
reported  to  be  related  to  a  reduced  sensitivity  to  EGFR- 
TKI  [25].  Currently  the  effect  of  the  molecular  relationship 
between  EGFR  and  E-cad  on  EGFR-targeting  therapy  is 
unclear.  Our  recent  findings  showed  that  not  only  expression 
but  also  localization  of  EGFR  and  E-cad  had  clinical  relevance 
in  predicting  lymph  node  metastasis  and  patient  survival  [36]. 
Lee  et  al  found  that  EGF  treatment  downregulates  E-cad  and 
upregulates  vimentin  in  cervical  cancer  cells  [42].  Lo  et  al 
also  reported  that  EGF  reduced  E-cad  expression  and  increased 
that  of  mesenchymal  proteins  [43].  Rho  et  al  indicated  that 
induction  of  EMT  may  contribute  to  the  decreased  efficacy  of 
therapy  in  primary  and  acquired  resistance  to  gefitinib  [44]. 
In  this  study,  we  demonstrated  quantitatively  that  EGFR  and 
E-cad  internalization  mediated  by  EGF  and  inhibition  of  this 
internalization  by  erlotinib  were  greater  in  the  sensitive  cells 
than  in  the  insensitive  cells.  These  quantifications  of  EGFR  and 


Figure  5.  Localization  of  EGFR  (QD605,  red)  and  E-cad  (QD565, 
green)  induced  by  EGF  with  or  without  erlotinib.  Images  were  taken 
by  an  Olympus  microscope  1X7 1  with  a  CRi  spectral  imaging  and 
quantifying  system.  Yellow  indicates  colocalization  of  EGFR  (red) 
and  E-cad  (green). 


E-cad  localization  in  response  to  EGFR  ligands  and  EGFR- 
TKI  will  facilitate  our  understanding  of  the  mechanism  of 
resistance  to  EGFR-targeted  therapy. 

Several  mechanisms  have  been  considered  to  contribute 
to  cancer  cell  resistance  to  EGFR-targeted  therapy  in  lung 
and  head  and  neck  cancer,  including  overexpression  of  EGFR 
ligand  [11],  increased  EGFR  gene  copy  numbers  [45,  46] 
overexpression  of  other  members  of  the  EGFR  family  [47],  and 
the  existence  of  the  EGFRvIII  mutation  [48].  In  addition,  in 
lung  cancer,  a  secondary  mutation  in  the  EGFR  gene,  T790M, 
and  amplification  of  the  MET  proto-oncogene  were  also 
suggested  to  contribute  to  EGFR-targeting  resistance  [49,  50]. 
Our  recent  study  of  human  head  and  neck  cancer  tissues 
has  identified  an  additional  possibility:  that  the  cytoplasmic 
localization  of  EGFR,  rather  than  EGFR  expression,  along 
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Figure  6.  (A)  Comparison  of  erlotinib-mediated  inhibition  of  EGF-induced  internalization  of  EGFR  and  E-cad  quantified  by  a  CRi  Nuance 
system.  Relative  internalization  was  defined  as  [1 -(membrane  EGFR  induced  with  EGF/membrane  EGFR  without  EGF  and  erlotinib)] 
x  100%.  (B)  Comparison  of  erlotinib-mediated  inhibition  of  EGF-induced  EGFR  internalization  measured  by  FACS.  Erlotinib  was  used  at  a 
range  of  concentrations  as  indicated  in  the  figure.  Relative  internalization  was  defined  as  (FITC  +  cells/total  cells)  x  100%. 


□  Erlotinib=0  uM 

□  Erlotinib=0.1uM 

□  Erlotinib=0  5uM 

□  Erlotinib=2.5uM 

□  Erlotlnib=12.5uM 


EGFR-TKI  insensitive  cells 


EGFR-TKI  sensitive  cells 


with  reduction  of  E-cad  may  explain  the  clinical  findings 
of  poor  response  to  EGFR-TKI  and  particularly,  therapeutic 
antibody  against  the  extracellular  portion  of  EGFR  [36].  In  the 
current  study,  all  of  the  lung  cancer  cell  lines  contain  wild  type 
EGFR  [25].  We  also  sequenced  the  head  and  neck  cancer  cell 
lines  686LN,  686LN-M4e,  and  686LN-R30  and  found  neither 
gain  of  function  nor  loss  of  function  mutations  in  these  cell 
lines  (data  not  shown).  Therefore,  the  resistance  phenotype 
of  these  cell  lines  is  not  derived  from  EGFR  gene  mutations. 
Rather,  this  study  supports  our  new  explanation  for  EGFR- 
targeting  resistance. 

Using  QD-based  ICC  and  CRi  spectral  imaging  software, 
we  have  developed  a  quantification  method  to  record  dynamic 
processes  in  cancer  cells  in  response  to  the  EGFR  ligand 
EGF  and  erlotinib.  The  quantification  results  are  consistent 
with  those  obtained  by  FACS,  but  QD  imaging  has  the 
advantage  over  FACS  of  providing  visualization  of  the  cellular 
localization  of  the  proteins  studied.  In  this  study,  we  have 
clarified  three  important  features  of  lung  and  head  and  neck 
cancer  cells  that  are  insensitive  to  EGFR-TKI,  at  least  in 
this  population.  First,  these  cells  have  lower  levels  of 
membrane  and  total  EGFR  than  sensitive  cells.  Second,  the 
insensitive  cells  showed  lower  levels  of  EGFR  internalization 
induced  by  EGF  than  the  sensitive  cells,  suggesting  the 
biological  activity  of  these  cells  may  not  rely  mainly  on  EGFR 
ligand-mediated  signal  transduction.  Third,  our  previous 


publication  and  others  have  shown  that  EGFR-TKI  inhibits 
EGFR  internalization  induced  by  EGF  [41,  51].  These  results 
suggest  that  quantification  of  membrane  expression  of  EGFR 
and  E-cad  may  serve  as  biomarkers  in  predicting  the  efficacy 
of  EGFR-targeted  therapy,  at  least  for  one  population  of  lung 
and  head  and  neck  cancer  patients.  Further  clarification 
of  the  substantial  differences  between  EGFR-TKI-insensitive 
and  -sensitive  cancer  cells  will  help  to  define  the  mechanism 
of  resistance  to  EGFR-targeted  agents  and  facilitate  the 
development  of  new  targeted  therapies. 

3.4.  Subcellular  localization  of  EGFR  in  the  endosome  and 
lysosome 

After  the  binding  of  EGF,  EGFR  dimerizes,  autophospho- 
rylates  and  then  internalizes.  Concomitantly,  these  ligand- 
receptor  complexes  cluster  into  clathrin-coated  pits,  internalize 
into  early  endosomes,  and  then  either  recycle  back  to  the  cell 
surface  or  eventually  traffic  to  lysosomes  for  degradation  [52]. 
Nishimura  et  al  demonstrated  efficient  endocytosis  of  the 
EGF-EGFR  complex  and  rapid  endocytosis  of  phosphorylated 
EGFR  via  the  early/late  endocytic  pathway  in  the  PC9 
NSCEC  cell  line  [51,  53].  Strong  evidence  indicates  that 
endosome-localized  EGFR  plays  an  important  role  in  cell 
signaling  [51,  52,  54]. 
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EGFR  +  Endosome  _ EGFR  +  Lysosome 


Figure  7.  Colocalization  of  EGFR  with  the  early  endosome  and  lysosome.  EGFR  was  tracked  with  QDs605  (red),  and  the  early  endosome 
and  late  endosome/lysosome  were  tracked  with  QDs565  (green).  Yellow  color  indicates  the  colocalization  of  EGFR  and  early  endosome  or 
late  endosomes/lysosomes. 


To  further  understand  the  differences  in  intracellular 
distribution  of  EGFR  between  EGFR-TKI- sensitive  and 
-insensitive  cell  lines,  NSCLC  cell  lines  H292,  H322,  HI 703, 
and  H460  were  double- stained  with  antibodies  specific  to 
EGFR  and  either  the  early  endosome  marker  EEA1  or  the 
late  endosome/lysosome  marker  CD63.  EEA1  and  CD63 
are  distributed  within  the  endocytic  organelles  at  a  high  con¬ 
centration  in  early  endosomes  and  late  endosomes/lysosomes, 
respectively.  Using  the  QD-ICC  method,  we  compared  EGFR 
subcellular  localization  in  endosomes  and  lysosomes  between 
the  sensitive  and  insensitive  cancer  cells  after  stimulating 
with  EGF  at  different  time  points.  In  the  absence  of  EGF, 
the  majority  of  EGFR  was  colocalized  within  large  swollen 
vacuoles  in  the  perinuclear  region  in  insensitive  cells,  while 
in  the  sensitive  cells,  EGFR  staining  was  found  mainly  on 
the  cell  membrane.  In  the  absence  of  EGF,  EGFR  did 
not  colocalize  with  either  the  early  endosomes  or  the  late 
endosomes/lysosomes  in  either  the  sensitive  or  insensitive 
cancer  cell  lines.  After  30  min  stimulation  by  EGF,  EGFR  was 
mainly  colocalized  with  the  early  endosomes  in  both  cell  types. 
In  most  sensitive  cells,  EGFR  colocalized  mainly  with  early 
endosomes  60  min  after  EGF  simulation,  and  was  recycled 
back  to  the  cell  membrane  after  60-120  min,  with  only  a 
little  EGFR  colocalized  with  late  endosomes/lysosomes.  In 
contrast,  in  most  of  the  insensitive  cells,  EGFR  was  colocalized 
with  late  endosomes/lysosomes  as  early  as  30  min  after  EGF 
stimulation,  and  colocalization  was  retained  up  to  120  min 
(figure  7). 

Our  results  demonstrate  that  in  the  insensitive  cells, 
EGFR  was  distributed  mainly  in  late  endosomes/lysosomes, 
where  completed  maturation  of  lysosomes  by  fusing  with 


the  late  endosomes  occurred.  In  contrast,  in  the  sensitive 
cells,  after  simulation  with  EGF,  EGFR  internalized  through 
intracellular  endocytic  trafficking  from  the  membrane  via  the 
early  endosomes  toward  both  the  membrane  (major)  and  the 
late  endosomes/lysosomes  (minor).  These  results  show  that 
endocytosis  of  the  EGF-EGFR  complex  occurs  via  different 
endocytic  pathways  in  EGFR-TKI- sensitive  versus  -insensitive 
cancer  cells. 

4.  Conclusions 

In  summary,  our  study  provides  a  potential  new  strategy 
for  using  a  QD  methodology  in  the  prediction  of  sensitivity 
to  EGFR-targeted  therapy.  There  are  many  advantages 
in  using  QD-based  image  analysis  instead  of  conventional 
fluorescent  dyes:  (i)  the  fluorescent  signal  generated  by  QDs 
is  more  stable  than  that  of  organic  dyes,  and  QDs  are  more 
resistant  to  photobleaching;  (ii)  QDs  have  broad  excitation 
spectra  and  narrow  emission  spectra  as  compared  with  organic 
dyes,  facilitating  quantification  of  the  image;  (iii)  through 
the  judicious  choice  of  an  appropriate  excitation  source, 
multiple  color  QDs  may  be  excited  simultaneously,  allowing 
quantification  of  multiple  biomarkers  on  the  same  sample. 

In  this  study,  QD-based  quantification  methodologies 
facilitated  the  analysis  of  subcellular  distributions  of  multiple 
biomarkers,  EGFR  and  E-cad,  providing  new  biomarkers  for 
the  prediction  of  sensitivity  to  EGFR-targeted  therapy,  which 
can  be  further  developed  for  clinical  application  to  tumor  tissue 
specimens.  Our  findings  highlight  substantial  differences 
between  EGFR-TKI- sensitive  and  -insensitive  cancer  cells  in 
the  cellular  localizations  of  EGFR  and  E-cad,  which  will 
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help  to  define  the  mechanisms  of  resistance  to  EGFR-targeted 
agents.  Quantification  of  multiple  proteins  by  QDs  may  help 
to  monitor  the  effect  of  EGFR-targeted  therapy  on  EGFR 
downstream  signaling  molecules  as  well  as  EGFR  parallel 
pathways  which  may  serve  as  new  therapeutic  targets  for 
treatment  of  cancer. 
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ABSTRACT 

Motivation:  Reverse  phase  protein  arrays  (RPPA)  measure 
the  relative  expression  levels  of  a  protein  in  many  samples 
simultaneously.  A  set  of  identically  spotted  arrays  can  be  used  to 
measure  the  levels  of  more  than  one  protein.  Protein  expression 
within  each  sample  on  an  array  is  estimated  by  borrowing  strength 
across  all  the  samples,  but  using  only  within  array  information.  When 
comparing  across  slides,  it  is  essential  to  account  for  sample  loading, 
the  total  amount  of  protein  printed  per  sample.  Currently,  total  protein 
is  estimated  using  either  a  housekeeping  protein  or  the  sample 
median  across  all  slides.  When  the  variability  in  sample  loading  is 
large,  these  methods  are  suboptimal  because  they  do  not  account 
for  the  fact  that  the  protein  expression  for  each  slide  is  estimated 
separately. 

Results:  We  propose  a  new  normalization  method  for  RPPA  data, 
called  variable  slope  (VS)  normalization,  that  takes  into  account  that 
quantification  of  RPPA  slides  is  performed  separately.  This  method 
is  better  able  to  remove  loading  bias  and  recover  true  correlation 
structures  between  proteins. 

Availability:  Code  to  implement  the  method  in  the  statistical 
package  R  and  anonymized  data  are  available  at  http:// 
bioinformatics.mdanderson.org/supplements.html. 

Contact:  sneeley@stats.byu.edu 

Supplementary  information:  Supplementary  data  are  available  at 
Bioinformatics  online. 

1  INTRODUCTION 

Protein  arrays  have  been  used  in  many  contexts  to  measure 
protein  expression  in  a  high-throughput  format  (Becker  et  al. 
2006;  Grote  et  al.  2008;  Hennessy  et  al.  2007;  Herrmann  et  al. 
2003;  Kornblau  et  al.  2009;  Kreutzberger  2006;  Park  et  al.  2008). 
Assays  that  measure  protein  are  able  to  address  questions  about 
post-translational  modifications  and  protein  pathway  relationships 
that  genomic  studies  alone  cannot  answer  (Nishizuka  et  al.  2003b). 
Several  different  protein  array  formats  have  been  developed,  but 
they  can  be  dichotomized  into  forward  and  reverse  phase  assays 
(Liotta  et  al.  2006).  In  forward  phase  arrays,  numerous  capture 
antibodies  are  printed  on  the  array,  which  is  then  exposed  to  a 
single  protein  sample,  allowing  the  simultaneous  measurement  of 

*To  whom  correspondence  should  be  addressed. 


the  level  of  multiple  targets  in  a  single  sample.  In  reverse  phase 
arrays,  numerous  protein  samples  are  printed  in  discrete  spots  on 
the  array,  which  is  then  probed  with  a  single  validated  antibody, 
simultaneously  measuring  the  level  of  a  single  protein  in  multiple 
samples.  One  reverse  phase  approach  that  uses  lysed  homogenized 
samples  is  the  protein  lysate  or  reverse  phase  protein  array  (RPPA) 
first  described  by  Paweletz  et  al.  (2001).  Since  then,  RPPAs  have 
been  used  by  several  groups  worldwide  to  study  the  protein  behavior 
in  diseases  (Chan  et  al.  2004;  Herrmann  et  al.  2003;  Jiang  etal.  2006; 
Korf  et  al.  2008;  Kornblau  et  al.  2009;  Mendes  et  al.  2007;  Park 
et  al.  2008;  Stevens  et  al.  2008;  Zhang  et  al.  2009). 

RPPAs  have  been  used  to  address  a  number  of  biological 
questions.  For  example,  RPPAs  were  used  to  study  proteomic 
signatures  of  signaling  pathways  in  various  types  of  cancer  including 
prostate  (Grubb  et  al.,  2003;  Paweletz  et  al.,  2001),  breast  (Akkiprik 
et  al.,  2006),  glioma  (Jiang  et  al.,  2006),  follicular  lymphoma 
(Gulmann  et  al. ,  2005)  and  leukemia  (Kornblau  et  al. ,  2009).  Calvert 
et  al.  (2007),  Nishizuka  et  al.  (2003a)  and  Mendes  et  al.  (2007) 
all  found  protein  signatures  that  were  able  to  distinguish  between 
diseases  or  subtypes  of  disease.  Other  studies  that  use  RPPAs  to 
study  proteins  relating  to  pathway  disregulation  or  drug  response 
in  cancer  or  other  diseases  include  Ma  et  al.  (2006);  Wulfkuhle 
et  al.  (2003),  Nishizuka  et  al.  (2003b),  Chan  et  al.  (2004),  Zha  et  al. 
(2004),  Shankavaram  et  al.  (2007)  and  Kim  et  al.  (2008). 

The  RPPA  assay  is  described  in  detail  in  Paweletz  et  al.  (2001) 
(see  also  Charboneau  et  al.,  2002;  Espina  et  al.,  2004;  Liotta  et  al., 
2006;  Tibes  et  al.,  2006).  Briefly,  biological  samples  are  lysed, 
resulting  in  solutions  that  contain  the  protein  of  interest  in  unknown 
amounts.  These  sample  lysates  are  spotted  onto  a  nitrocellulose 
backed  array  in  a  dilution  series.  The  array  is  then  hybridized 
with  a  specific  antibody  validated  to  recognize  only  the  protein  of 
interest.  Next,  the  array  is  incubated  with  a  biotinylated  secondary 
antibody  that  recognizes  and  binds  to  the  primary  antibody.  Finally, 
streptavadin-linked  labels  (such  as  dyes)  are  introduced  and  bound 
to  the  biotin.  When  the  array  is  processed,  the  labels  can  be  observed 
and  measured.  It  is  assumed  that  the  amount  of  label  corresponds 
to  the  amount  of  protein  at  the  spot.  The  processed  arrays  are 
scanned  and  the  resulting  images  are  analyzed  with  array  software 
(we  use  Micro Vigene®,VigeneTech,  Carlisle,  MA)  that  measures 
the  foreground  and  background  intensities  of  the  label  at  each  spot. 
RPPA  ‘raw  data’  consists  of  these  measurements  of  foreground  and 
background  intensity  at  each  spot  on  the  array.  Figure  1  shows  an 
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Fig.  1.  Image  of  an  example  RPPA  with  1152  separate  dilution  series.  Each 
dilution  series  is  printed  in  5-spot  1/2  dilutions.  The  zoom-in  box  shows  12 
dilution  series  on  the  array.  The  darker  the  spot,  the  higher  the  amount  of 
protein.  Some  of  the  spots  do  not  appear  on  the  array  or  in  the  zoom-in  box 
because  there  was  no  label  (i.e.  protein)  at  the  spot. 

example  of  an  RPPA  slide  with  details  of  a  few  samples  in  their 
dilution  series.  The  darker  spots  contain  more  protein  than  the  lighter 
spots. 

The  reverse  phase  nature  of  RPPAs  allows  the  levels  of  only  one 
protein  to  be  measured  per  array.  Thus,  RPPA  experiments  involve 
multiple  arrays  each  printed  identically  with  the  same  samples  but 
probed  with  different  antibodies.  Such  a  set  of  arrays  allows  for 
estimation  of  sample  effects  that  can  go  undetected  with  one  array. 

Similar  to  other  array  formats,  this  data  undergoes  a  series 
of  preprocessing  steps  before  a  formal  analysis.  The  three  main 
preprocessing  steps  are  background  subtraction,  quantification  and 
normalization.  RPPA  processing  steps  are  performed  sequentially: 

(1)  Background  correction’,  the  background  spot  intensities  are 
used  to  subtract  baseline  or  non-specific  signal  from  the 
foreground  spot  intensities. 

(2)  Quantification’,  the  background  adjusted  spot  intensities  from 
each  dilution  series  are  mapped  into  one  number,  the  protein 
expression,  that  represents  the  amount  of  protein  in  the  sample 
relative  to  the  other  samples  on  the  array. 

(3)  Normalization:  the  estimated  relative  sample  expressions  are 
adjusted  to  account  for  known  sources  of  variation. 


In  this  article,  we  focus  on  the  normalization  step.  Specifically,  we 
discuss  current  practices  for  estimating  and  correcting  array  and 
sample  effects,  with  more  focus  on  sample  effects.  We  also  propose 
a  new  normalization  model  that  corrects  array  and  sample  effects 
based  on  the  assumption  that  the  protein  expression  estimates  from 
each  array  are  potentially  on  slightly  different  scales  due  to  random 
variability  in  the  quantification  step. 

Row  and  sample  effects  are  assumed  to  be  additive  on  the  log 
scale: 

xjp=Xj+SP  +  cjp 

where  xjp  is  the  estimated  relative  log  expression  in  sample  j  on 
array  p,  Xj  is  the  effect  due  to  sample  j,  8p  is  the  effect  due  array  p 
and  Cjp  is  the  relative  protein  log  expression  with  sample  and  array 
effects  removed.  Array  effects  are  due  to  the  fact  that  each  array  is 
quantified  separately  and  protein  expression  is  relative  within  slides. 
Sample  effects  occur  when  different  amounts  of  total  protein  are 
unintentionally  spotted  on  the  array  for  different  samples.  Array  and 
sample  effects  are  further  discussed  in  the  next  section. 

The  model  we  propose  is  a  slight  variation.  The  following  simple 
modification  to  Equation  (1)  can  improve  normalized  results  when 
there  is  large  variation  in  the  sample  effects: 

xjp  =  (^j  +  &p+cjp)Yp-  (2) 

Here,  the  yp  term  refers  to  a  protein  specific  quantity  that  helps  to 
account  for  error  in  estimating  the  sample  protein  expressions. 

In  order  to  motivate  this  model,  we  briefly  discuss  the 
quantification  step.  This  is  the  only  step  that  has  been  explicitly 
addressed  in  RPPA  data  (see  Hu  et  al.,  2007;  Mircean  et  al.,  2005; 
Nishizuka  et  al .,  2003b;  Tabus  et  al .,  2006;  and  Supplementary 
Material). 

The  purpose  of  the  quantification  is  to  estimate  the  relative  amount 
of  protein  in  a  sample  as  compared  with  the  other  samples  on 
the  same  array  using  information  from  the  dilution  series  and  the 
observed  intensities.  This  is  accomplished  by  establishing  a  model 
relationship  between  the  observed  spot  intensities  and  unknown 
relative  expressions.  Various  groups  have  developed  methods 
for  RPPA  quantification  including  models  that  use  only  sample 
information  (Mircean  et  al.,  2005;  Nishizuka  etal .,  2003b)  and  ‘joint 
sample’  models  that  borrow  strength  from  all  samples  on  the  array 
(Hu  et  al .,  2007;  Tabus  et  al .,  2006).  We  use  a  joint  sample  model 
developed  by  our  group  at  MD  Anderson,  called  ‘SuperCurve’ .  This 
model  is  explained  in  detail  in  the  Supplementary  Material.  Briefly, 
a  three  parameter  logistic  equation  is  used  to  model  the  dependency 
of  the  observed  intensity  on  the  unknown  protein  expression.  There 
is  one  overall  logistic  curve  estimated  for  the  whole  array  and 
individual  protein  expressions  are  estimated  as  offsets  from  the 
overall  curve.  The  logistic  equation  parameters  and  sample  protein 
expressions  are  estimated  iteratively  (see  Supplementary  Material). 

Sample  protein  expressions  are  estimated  relative  to  the  other 
samples  on  the  array,  and  are  reported  without  units.  Since  most 
estimation  models,  including  SuperCurve,  compute  log  expression 
values,  in  this  article,  we  treat  all  expressions  as  on  the  log  scale. 

Array  quantification  is  performed  individually  for  each  array. 
We  have  observed  that  this  separate  estimation  of  protein  expression 
can  result  in  unexpected  multiplicative  effects.  For  example,  one 
experiment  that  we  ran  involved  more  samples  than  could  be  printed 
on  a  single  array.  We  randomly  allocated  each  sample  to  one  of 
two  groups,  balancing  for  all  potentially  explanatory  covariates  that 
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we  could  identify  in  advance.  These  two  groups  of  samples  were 
then  printed  on  two  parallel  sets  of  arrays,  which  were  interrogated 
with  the  same  sets  of  antibodies.  Due  to  the  randomization,  we  knew 
that  the  distributions  of  protein  expression  for  a  given  antibody 
should  be  the  same  for  Groups  1  and  2.  However,  comparison  of  the 
expression  distributions  showed  an  unexpected  shift  in  scale  due  to 
slightly  different  estimates  of  the  slopes  in  the  logistic  curves  used 
in  the  quantification  of  these  arrays.  These  small  differences  are  a 
result  of  error  in  the  estimates  of  the  logistic  parameters.  However, 
while  the  differences  in  slope  estimates  were  slight,  the  range  of 
sample  loadings  was  broad,  so  the  final  expression  estimates  (and 
the  protein  clusters)  were  quite  different.  It  is  important  to  note  that 
while  this  experiment  (with  samples  split  across  arrays)  first  led  us 
to  identify  the  problem,  these  shifts  in  logistic  slope  are  also  present 
[and  can  be  fixed  with  variable  slope  (VS)  normalization]  in  the 
more  common  design  context  where  all  samples  are  printed  on  one 
array. 

The  usual  normalization  model  in  Equation  (1)  fails  to  capture  the 
fact  that  protein  expression  is  estimated  separately  for  each  slide  and 
each  array  can  have  slightly  different  slopes  in  the  overall  logistic 
curve.  Small  errors  in  the  slope  parameter  of  the  logistic  curve  can 
result  in  large  variation  if  not  properly  accounted  for. 

The  proposed  model,  Equation  (2),  adjusts  for  variability  in  slide- 
to-slide  expression  estimates  when  adjusting  for  sample  loading. 
The  yp  term  refers  to  a  protein- specific  quantity  that  accounts  for 
potentially  differing  slopes  in  the  sigmoidal  slope  from  the  curve 
estimated  with  quantification  methods  described  in  Tabus  et  al. 
(2006),  Hu  et  al.  (2007)  and  the  Supplementary  Material.  We  call  the 
new  approach  VS  normalization  because  it  accounts  for  variation  in 
the  estimated  slope  parameters  from  the  calibration  curve  estimated 
in  the  quantification  step. 

2  METHODS 

Normalization,  using  either  Equations  (1)  or  (2),  requires  estimation  of 
both  array  and  sample  effects.  Equation  (2)  additionally  requires  estimating 
multiplicative  array  effects.  We  first  address  additive  array  and  sample  effects 
and  then  the  multiplicative  array  effects. 

2.1  Array  and  sample  effects 

Array  effects,  8P  in  Equations  (1)  and  (2),  are  actually  common  and  even 
expected  since  each  slide  is  quantified  separately  and  expression  estimates  are 
relative  within  slides.  These  effects  are  corrected  by  normalizing  expression 
to  the  median  slide  expression  estimate  so  that  each  array  has  the  same 
median  expression. 

Sample  effects,  kj ,  occur  when  the  amount  of  total  protein  that  is  spotted  on 
the  array,  the  sample  loading,  varies  from  sample  to  sample.  Unintentionally 
printing  differing  amounts  of  total  protein  for  each  sample  can  result  in  false 
conclusions  of  differential  expression.  Although  efforts  are  made  when  the 
array  is  being  printed  to  equalize  total  protein,  this  is  often  an  unavoidable 
problem.  For  example,  the  same  number  of  cells  can  be  used  in  each 
biological  sample,  but  if  the  size  of  the  cells  differs,  then  samples  with 
larger  cells  will  have  more  protein. 

Sample  loading  has  been  estimated  with  a  ‘housekeeping’  (HK)  protein, 
such  as  /?-Actin,  as  in  Jiang  et  al  (2006)  and  Mendes  et  al  (2007).  A  HK 
protein  is  a  protein  that  should  be  present  in  the  same  amount  for  all  samples 
so  differences  in  expression  reflect  differences  in  sample  loading.  However, 
in  reality  there  is  no  protein  that  meets  this  expectation,  and  the  expression 
levels  of  HK  proteins  can  be  quite  variable.  We  refer  to  normalization  with 
Equation  (1),  estimating  kj  with  a  HK  protein,  as  HK  normalization. 


Another  method  that  is  used  to  estimate  sample  loading  for  the  j-th  sample 
is  to  use  the  median  protein  expression  estimate  for  sample  j  across  all 
the  arrays,  kj  =  median,- (x^).  This  method  assumes,  first,  that  all  the  arrays 
were  printed  in  a  similar  manner  and,  second,  that  most  proteins  will  not 
be  abnormally  expressed  but  the  few  that  are  will  still  be  noticed  after 
normalization  to  the  median.  It  is  important  to  note  that  this  method  requires 
a  set  of  arrays  with  the  same  samples.  Normalization  with  Equation  (1)  but 
estimating  kj  with  the  median  is  called  median  loading  (ML)  normalization. 

2.2  Multiplicative  protein  effects 

The  array-specific  multiplicative  effects,  yp,  in  model  2  are  partially 
confounded  with  the  additive  protein  effects,  8p.  We  outline  a  method  that  we 
have  found  to  be  effective  in  estimating  parameters  and  performing  sample 
loading  normalization  according  to  the  VS  normalization  model. 

First,  write  (2)  as 

Xjp  =  (\j  T  cjp  )Yp~k  $p  Yp  •  (3) 

The  confounded  term,  8pyp,  is  lumped  together  as  the  overall  protein  effect 
and  estimated  with  the  median  of  protein  p  across  all  samples  [yp8p  = 
median p(xjp)].  Moving  this  term  to  the  left-hand  side,  (3)  can  be  written  as 

xjp  —  Yp  &p  =  Yp  (\j  T  cjp ) .  (4) 

We  will  not  be  able  to  estimate  the  exact  yp  s,  but  taking  the  ratio  of  (4)  for 
two  values  of  p  will  allow  estimation  of  the  relative  yp  s.  This  ratio  will  be 

xjp\  ~  Ypi  jgi  _  Yp\  Cfy  +  c7>i)  ^  Ypp  ^ 

Xjp2  Yp2  ^P2  CjP2  )  Yp2 

where 

(^/ +  cJpi)  ^  i 
(k  j  +  cjp2  ) 

since  we  assume  that  most  cjp  s  will  be  small  relative  to  kj.  We  also  assume 
that  the  yp  s  have  an  expected  value  of  1  and  a  small  variance.  They  should 
realistically  have  a  range  of  around  0.5-1. 5  so  that  there  should  not  be  a 
danger  of  ratios  behaving  badly  as  the  denominator  gets  close  to  0.  We  define 
Xjp  =Xjp  —  yp8p,  hence  Equation  (5)  implies  that  xjP]  /xjP2  estimates  the  ratio 
Yp\  / Yp2  •  This  ratio  can  be  estimated  by  regressing  xjP]  on  xjP2 .  Since  there 
is  no  preferred  direction  (we  could  just  as  easily  regress  xjP2  on  xjP] )  we  use 
perpendicular  least  squares  (de  Groen,  1996;  Rencher,  1995).  The  logs  of 
these  ratios  are  used  to  set  up  a  system  of  equations  whose  solution  yields 
estimates  of  the  logo’s:  the  system  is  made  non-singular  by  setting 

1  K 

^E1(w=0- 

p 

VS  normalization  is  the  process  of  adjusting  the  matrix  xjp  by  dividing  each 
column  by  the  appropriate  yp,  and  subtracting  from  each  row  the  appropriate 
kj  =  median/ (x/p)  to  obtain  the  estimate  of  Cjp. 

3  SIMULATIONS 

We  ran  simulations  to  compare  VS,  ML  and  HK  normalization. 
We  randomly  generated  30  proteins,  each  with  200  samples,  from 
independent  standard  normal  distributions.  Array  and  samples 
effects  were  generated  according  to  the  following  distributions, 
based  on  empirical  data: 


V 

-#(- 2,16) 

(6) 

v 

-  #(-1,4) 

(7) 

log  Yp  ' 

-#(0,0.01) 

(8) 

The  ‘HK  protein’  was  modeled  as: 

■^/Phouse 

+€j  —#(0,0.5). 

(9) 
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Table  1.  Results  of  the  simulation  comparing  the  MSE s  of  HK,  ML  and  VS 
normalization 


Contrast 

HK  MSE 

ML  MSE 

VS  MSE 

Low  MSE 

1  Correlated  columns 

0.023 

0.029 

0.005 

0.002 

2  Differential  expression 

3.429 

3.506 

2.431 

2.000 

The  theoretical  minimum  (Low)  is  also  shown.  We  looked  at  two  contrasts:  (i)  the 
correlation  between  two  columns  that  should  have  a  correlation  of  0.6  and  (ii)  the 
difference  between  an  unexpressed  sample  and  a  sample  with  spiked  in  expression 
(with  a  value  of  5)  within  the  same  protein. 


In  comparing  the  three  methods,  we  wanted  to  assess  the  ability 
of  each  (i)  to  recover  true  protein  correlation  and  (ii)  to  detect 
differential  expression.  To  this  end,  two  protein  expression  vectors 
were  generated  to  have  a  correlation  of  0.6,  and  another  was  ‘spiked’ 
with  expression  by  adding  a  constant  to  one  of  the  samples. 

After  each  simulation,  we  performed  normalization  with  the  three 
methods  and  computed  (i)  correlation  between  correlated  proteins 
and  (ii)  differential  expression.  Each  target  was  compared  with  the 
truth  using  an  estimated  mean  squared  error  (MSE)  defined  as 

MSE = var(<?,) + (0 - 0,-)2  (10) 

n  ^ 

where  6  is  the  true  value  of  the  contrast  (i.e.  true  correlation  or  true 
differential  expression),  and  fy  is  the  value  of  the  contrast  for  the 
i-th  simulation. 

Table  1  shows  the  MSE  of  the  contrasts  after  1000  simulations. 
The  MSE  for  VS  normalization  is  better  than  both  the  other  methods 
in  every  case  and  is  good  at  maintaining  correlation  between  proteins 
with  known  correlation. 

^The  last  column  of  the  table  shows  the  ‘Lowest’  MSE  or  what  the 
MSE  would  be  if  the  parameters  were  known.  This  number  is  not  0 
because  of  randomness  in  the  data. 

We  performed  a  second  set  of  simulations  in  which  we  varied  the 
simulation  parameters,  including  the  number  of  proteins,  the  number 
of  samples  and  the  SDs  of  A j,  8p  and  yp  in  Equations  (6-8).  The 
results  of  this  simulations  (shown  in  the  Supplementary  Material) 
similarly  show  that  VS  normalization  performs  as  well  or  better 
than  the  other  methods  in  all  situations.  The  differences  are  most 
dramatic,  however,  when  the  variability  in  the  sample  loadings  is 
large. 

We  ran  a  third  simulation  to  compare  only  VS  normalization 
with  ML  normalization  when  clustering  proteins.  We  generated  30 
proteins  with  200  samples  from  a  multivariate  normal  distribution 
with  a  covariance  structure  that  allowed  for  five  correlated  groups  as 
follows:  Group  1,  N  =  3,  r  =  0.4;  Group  2,  N  =  10,  r  =  0.2;  Group  3, 
N  =  5,  r  =  0.2;  Group  4,  N  =  5,  r  =  0.5;  and  Group  5,  N  =  1,  r  =  03. 
The  column,  row  and  slope  effects  were  generated  according  to 
Equations  (6-8).  Figure  2  shows  a  plot  of  the  ‘true’,  observed  and 
normalized  data  matrices  from  a  typical  simulation. 

It  is  easy  to  distinguish  between  groups  for  the  true  data,  but 
grouping  becomes  scrambled  after  the  row  and  column  effects  are 
introduced.  ML  normalization  is  able  to  separate  some  of  the  groups 
but  still  leaves  many  proteins  scrambled.  VS  normalization  is  able 
to  recover  and  separate  all  the  groups  present  in  the  ‘true’  matrix. 
This  example  illustrates  the  strength  of  VS  normalization  to  recover 
true  correlation  structure  between  proteins  in  the  presence  of  high 
sample  loading  variability. 
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Fig.  2.  The  first  two  principal  components  plotted  against  each  other  for 
the  true  data  matrix  (A),  the  observed  data  matrix  (B),  the  data  matrix 
after  ML  normalization  (C)  and  the  data  matrix  after  VS  normalization  (D). 
There  are  five  groups  each  plotted  in  a  different  shade  and  symbol  (every 
point  represents  a  different  protein/array).  Each  panel  is  shown  in  principal 
component  space  so  rotation  of  axes  is  arbitrary;  it  is  only  important  how  the 
points  group  together.  VS  normalization  is  able  to  recover  the  group  structure 
observed  in  the  ‘true’  matrix,  while  ML  normalization  only  recovers  one  of 
the  groups. 


4  EXAMPLE  WITH  LEUKEMIA  DATA 

We  applied  the  normalization  methods  to  an  RPPA  experiment 
studying  protein  signatures  in  leukemia.  A  series  of  138  lysate 
arrays  were  printed  with  either  blood  or  marrow  samples  from  360 
patients  with  acute  lymphoblastic  leukemia  (ALL).  Each  sample  was 
printed  in  duplicate  on  the  array;  each  replicate  was  printed  in  a  five 
spot,  2-fold  dilution  series.  We  used  Super  Curve  (see  supplementary 
Material)  to  estimate  protein  expression  for  each  dilution  series. 

The  sample  loadings  for  this  data  are  quite  variable.  Figure  3 
shows  the  protein  expression  for  two  extreme  samples  across  all 
of  the  arrays.  Figure  3A  plots  the  expression  before  any  loading 
normalization,  showing  that  two  samples  can  differ  by  nearly  8  U 
on  a  log  2  scale  (a  256-fold  difference)  just  due  to  sample  loading. 
Figure  (3B-D)  plots  protein  expression  for  the  same  two  samples 
after  HK  ,  ML  and  VS  normalizations.  There  is  still  a  slight  loading 
bias  after  HK  normalization,  but  the  other  two  methods  are  able  to 
correct  this. 

We  performed  hierarchical  clustering  of  the  138  proteins, 
using  average  linkage  for  the  linkage  method  and  Pearson’s 
correlation  coefficient  for  the  distance  metric.  For  both  VS  and 
ML  normalization  methods,  we  checked  the  robustness  of  the 
protein  clusters  using  bootstrap  clustering  (Kerr  and  Churchill,  2001 ; 
Pollard  and  van  der  Laan,  2005).  HK  normalization  is  excluded 
here  because  it  did  not  remove  all  sample  loading  bias  (Fig.  3). 
The  idea  behind  bootstrap  clustering  is  to  see  how  often  each 
pair  of  proteins  clusters  together  in  a  set  of  bootstrapped  samples. 
Based  on  the  median  split  silhouette  statistic  (see  Pollard  and 
van  der  Laan  2005),  we  assumed  nine  clusters.  Figure  4  shows  the 
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Fig.  3.  Protein  expression  for  two  extreme  samples  (a  low  expressed  sample  in  black  and  a  high  expressed  sample  in  gray)  from  an  RPPA  experiment  with 
138  slides.  (A-D)  The  expression  for  the  two  samples  across  all  arrays  in  the  set.  The  array  index  is  plotted  on  the  x-axis  and  the  estimated  log  expression  is 
plotted  on  the  y-axis.  When  there  is  no  normalization,  there  is  nearly  an  8  log2  unit  difference  in  expression  (256-fold)  between  these  samples  primarily  due 
to  sample  loading  effects.  HK  normalization  mostly  corrects  for  sample  loading,  but  there  is  still  a  4-fold  sample  loading  bias  that  the  HK  protein  does  not  fix. 
Both  median  and  VS  normalization  completely  correct  this  level  of  observed  sample  loading  bias.  Note  that  there  are  differences  in  scale  in  each  of  the  plots. 


Median 


Variable  Slope 


since  the  same  set  of  samples  is  used.  We  counted  the  number  of 
protein  pairs  that  clustered  together  using  the  samples  from  one 
replicate  but  did  not  cluster  together  using  the  samples  from  the 
other  replicate  and  divided  this  count  by  the  total  number  of  protein 
pairs.  The  count  is  interpreted  as  the  percentage  of  protein  pairs  that 
did  not  cluster  consistently.  For  the  ALL  samples  from  the  set  of 
138  arrays,  ML  normalization  inconsistently  clustered  24%  of  the 
protein  pairs  while  VS  normalization  inconsistently  clustered  17% 
of  the  protein  pairs.  After  filtering  out  non-informative  samples,  ML 
normalization  inconsistency  dropped  to  18%  and  VS  normalization 
inconsistency  dropped  to  12%.  Both  results  are  consistent  with  VS 
normalization  as  the  preferred  method. 


Fig.  4.  Bootstrap  Cluster  results  after  ML  (A)  and  VS  normalization  (B). 
The  colors  on  the  margin  were  assigned  based  on  group  membership  after 
clustering  the  VS  normalized  data  matrix.  The  marginal  colors  are  present 
only  to  show  that  there  is  some  shift  in  group  membership  depending  on  the 
normalization  method.  The  clusters  found  after  VS  normalization  seem  to  be 
more  robust. 

results  of  a  bootstrap  cluster  test  with  500  bootstrapped  samples 
after  ML  and  VS  normalizations.  The  figure  ranges  from  perfectly 
yellow,  meaning  the  proteins  always  cluster  in  the  same  group, 
to  perfectly  blue  meaning  the  proteins  never  cluster  in  the  same 
group  (color  version  online).  There  are  nine  marginal  colors  that 
were  assigned  based  on  group  membership  of  the  proteins  after  the 
hierarchical  cluster  of  the  VS  normalized  data  matrix.  The  colors 
in  the  margins  of  the  plot  are  arbitrary,  only  used  to  illustrate 
change  in  group  membership.  The  figure  demonstrates  that  there  is 
change  in  protein  group  membership  depending  on  the  normalization 
method  used,  confirming  that  the  normalization  approach  is  an 
important  consideration.  Although  it  is  not  possible  to  say  which 
grouping  is  correct,  the  clusters  found  after  VS  normalization  appear 
more  robust,  as  seen  by  the  tighter  yellow  squares  along  the  diagonal. 

We  attempted  to  determine  which  normalization  method  is  most 
consistently  correct.  The  ALL  samples  were  each  printed  in  duplicate 
on  the  array,  so  we  performed  hierarchical  clustering  with  each  set  of 
replicates  separately  after  both  VS  and  ML  normalization  methods. 
Clustering  with  each  replicate  set  should  produce  the  same  clusters, 


5  DISCUSSION 

Protein  arrays  are  not  currently  in  as  widescale  use  as  genomic  arrays 
or  other  proteomic  techniques;  however,  they  are  becoming  more 
common.  Several  different  groups  have  published  analyses  using 
RPPA  technology  (Gulmann  et  al.  2005;  Hennessy  et  al.  2007;  Jiang 
et  al.  2006;  Korf  et  al.  2008;  Kornblau  et  al.  2009;  Park  et  al. 
2008;  Tibes  et  al.  2006).  Furthermore,  RPPAs  can  be  produced 
with  common  laboratory  materials  and  techniques,  making  them 
more  accessible  than  genomic  arrays  or  mass  spectrometry.  As  this 
technology  becomes  more  common,  appropriate  preprocessing  of 
the  data  will  be  even  more  important. 

It  is  not  a  trivial  problem  to  determine  the  best  way  or  ways  to 
normalize  RPPA  data.  We  have  presented  a  traditional  framework 
for  correcting  sample  and  array  effects.  We  further  introduced  a 
slight  modification  to  the  standard  procedure,  the  VS  normalization 
method,  which  normalizes  for  total  protein  while  taking  into 
account  error  that  can  be  introduced  during  the  quantification 
step.  Namely,  since  each  array  is  quantified  separately,  slight 
variations  in  the  estimated  logistic  slope  of  the  dose  response  curve 
can  create  problems  if  ignored.  The  VS  model  better  explains 
observed  behavior  and  matches  our  knowledge  of  protein  expression 
estimation.  We  have  shown  through  simulation  and  an  example  with 
real  data  that  how  the  VS  method  can  recover  true  group  correlations 
better  than  simply  normalizing  to  the  median  of  the  samples  or  to 
a  HK  protein.  We  have  also  pointed  out  that  the  usual  practice  of 
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normalization  to  a  HK  protein  can  be  problematic  both  because  of 
difficulties  in  finding  a  true  HK  protein  and  failures  to  remove  all 
sample  loading  bias. 

The  impact  of  the  multiplicative  protein  effect,  yp  in  Equation 
(2),  depends  on  variability  in  the  sample  loadings.  Since  the  yp s 
are  centered  close  to  one,  when  the  sample  loadings  have  small 
variability,  the  impact  of  yp  will  also  be  small.  However,  the 
relatively  small  values  of  yp  can  have  a  large  impact  when  variability 
in  the  sample  loadings  is  large,  as  for  example  in  the  ALL  data 
cited  here.  In  these  cases,  it  is  especially  important  to  correct  for 
both  additive  and  multiplicative  protein  effects.  The  type  of  sample 
contributes  to  how  big  the  sample  loading  problem  can  be.  Cell 
lines,  for  example,  are  not  nearly  as  variable  as  tissue  samples  and 
usually  do  not  have  such  large  variations  in  sample  loading  across 
the  samples. 

The  problem  of  sample  loading  is  not  something  that  can  be 
resolved  or  even  seen  with  just  one  array.  Simulations  not  shown 
here  suggest  that  at  least  20  arrays  (proteins)  are  adequate  to  provide 
good  estimates  of  total  protein,  though  fewer  arrays  can  indicate  a 
sample  loading  problem. 

This  is  the  first  attempt  that  we  know  of  to  combine  information 
across  arrays  in  an  RPPA  study  instead  of  focusing  on  each  arrays 
individually. 

Better  estimation  of  the  VS  model  parameters  might  be  achieved 
with  other  estimation  methods,  such  as  with  an  iterative  approach. 
In  the  future,  we  will  investigate  this  possibility  and  how  better 
estimates  can  improve  results  more.  Although  this  procedure  was 
developed  for  RPPAs,  it  can  have  application  to  any  array  assay  in 
which  different  samples  are  printed  together  on  the  same  array. 

Conflict  of  Interest:  none  declared. 
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Abstract 

Clinical  resistance  to  epidermal  growth  factor  receptor 
(EGFR)  inhibition  in  lung  cancer  has  been  linked  to  the 
emergence  of  the  EGFR  T790M  resistance  mutation  or 
amplification  of  MET.  Additional  mechanisms  contributing 
to  EGFR  inhibitor  resistance  remain  elusive.  By  applying 
combined  analyses  of  gene  expression,  copy  number,  and 
biochemical  analyses  of  EGFR  inhibitor  responsiveness,  we 
identified  homozygous  loss  of  PTEN  to  segregate  EGFR- 
dependent  and  EGFR-independent  cells.  We  show  that  in 
EGFR-dependent  cells,  PTEN  loss  partially  uncouples  mutant 
EGFR  from  downstream  signaling  and  activates  EGFR,  thereby 
contributing  to  erlotinib  resistance.  The  clinical  relevance  of 
our  findings  is  supported  by  the  observation  of  PTEN  loss  in 
1  out  of  24  primary  EGFR-mutant  non- small  cell  lung  cancer 
(NSCLC)  tumors.  These  results  suggest  a  novel  resistance 
mechanism  in  EGFR -mutant  NSCLC  involving  PTEN  loss. 
[Cancer  Res  2009;69(8):3256-61] 


Introduction 

Activating  mutations  in  the  epidermal  growth  factor  receptor 
(EGFR)  are  present  in  —10%  of  non- small  cell  lung  cancers 
(NSCLC)  in  Caucasian  patients  and  in  up  to  40%  of  East-Asian 
patients.  By  contrast,  EGFR  mutations  are  much  more  rare  in 
African  Americans.  These  mutations  lead  to  the  “addiction”  of 
mutant  cells  to  the  oncogenic  signals  driven  by  mutant  EGFR. 
This  dependency  is  thought  to  be  the  cause  of  the  clinical 
observations  that  EGFR -mutant  tumors  shrink  when  treated  with 
EGFR  inhibitors  (1,  2).  Eventually,  these  tumors  recur;  in  —60%  to 
70%  (3)  of  cases,  this  has  been  linked  to  the  emergence  of  either 
the  T790M  resistance  mutation  of  EGFR  or  amplification  of  MET 
(2-4).  However,  a  mechanistic  explanation  for  acquired  resistance 
in  the  remaining  cases  is  lacking. 


Note:  Supplementary  data  for  this  article  are  available  at  Cancer  Research  Online 
(http://cancerres.aacrjournals.org/). 
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Here,  we  used  a  large  collection  of  genomically  characterized 
NSCLC  cell  lines  in  order  to  derive  genomic  features  that  segregate 
EGFR-dependent  from  EGFR-independent  EGFR-mutant  lung 
tumor  cells.  We  combined  computational,  biochemical,  and  cellular 
approaches  to  identify  novel,  clinically  relevant  mechanisms 
uncoupling  EGFR-dependent  tumors  from  downstream  signaling. 

Materials  and  Methods 

A  detailed  description  of  all  methods  is  given  in  the  Supplementary 
Methods.  As  part  of  a  larger  effort  to  characterize  the  genomes  of  NSCLC, 
we  have  collected  84  NSCLC  cell  lines,  which  we  analyzed  for  chromosomal 
gene  copy  number  alterations,  mutations,  as  well  as  transcriptional 
changes.  The  detailed  description  of  this  collection  will  be  published 
elsewhere.  Here,  a  subset  of  53  of  these  cell  lines  was  studied 
(Supplementary  Table  SI).  Hierarchical  clustering  was  performed  using 
dCHIP.  Genomic  lesions  differentiating  between  erlotinib-sensitive  and 
erlotinib -insensitive  cells  were  analyzed  by  inferring  the  mean  copy  number 
of  chromosomal  windows  from  five  contiguous  loci.  Statistical  analyses 
were  performed  using  R. 

Results  and  Discussion 

In  order  to  analyze  oncogene  dependencies  in  lung  cancer,  we 
used  a  collection  of  84  NSCLC  cell  lines  that  we  have  recently 
characterized  in-depth  genomically  and  phenotypically  (Supple¬ 
mentary  Table  SI).14 

We  performed  hierarchical  clustering  of  gene  expression  data  of 
53  of  these  lines.  In  this  analysis,  the  EGFR- mutant  cell  line,  H1650, 
did  not  share  a  cluster  with  all  other  EGFR -mutant  cell  lines 
(Fig.  L4).  This  cell  line  has  previously  been  reported  to  be  erlotinib- 
resistant,  despite  lacking  known  resistance  mechanisms  (Fig.  L4; 
ref.  5). 

Confirming  these  observations,  H1650  cells  were  erlotinib- 
resistant  with  a  half-maximal  inhibitory  concentration  (IC50)  of 
2.13  pmol/L  (Fig.  IB).  As  previously  reported,  EGFR -mutant 
HCC827  cells  were  erlotinib-sensitive  (IC50,  0.02  pmol/L),  whereas 
H1975  cells  expressing  both  the  erlotinib-sensitizing  L858R 
mutation  and  the  T790M  resistance  mutation  were  resistant 
(IC50  >  10  pmol/L;  Fig.  1 B;  refs.  5,  6).  Treatment  with  100  nmol/L  of 


14  M.L.  Sos  et  at,  under  revision. 
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Figure  1.  An  EGFR  independence  signature  in  HI 650  cells.  A,  hierarchical  clustering  of  53  NSCLC  cells  according  to  gene  expression.  Erlotinib  sensitivity 
(IC50  <  1  jamol/L,  red ;  IC50  >  1  i-Lmol/L,  gray)  and  EGFR  mutations  (EGFR-mutant,  black ;  T790M,  red ;  EGFR  wild-type,  gray)  as  well  as  MET  amplification  (black). 
B,  left,  cellular  viability  as  a  function  of  erlotinib  dose  for  all  three  cell  lines  studied.  Right,  mutation  status  and  IC50  values.  C,  cells  were  treated  with  different  doses  of 
erlotinib.  Activation  of  EGFR  and  downstream  signaling  pathways  was  determined  by  analyzing  the  amount  of  phosphorylated  versions  of  the  respective  proteins 
in  comparison  with  their  total  levels  using  phosphorylation-specific  antibodies. 


erlotinib  led  to  the  dephosphorylation  of  EGFR  in  H1650  and 
HCC827  but  not  in  H1975  cells  (Fig.  1C).  However,  although  the 
dephosphorylation  of  EGFR  was  accompanied  by  a  reduction  in 
p-Akt  levels  in  erlotinib-sensitive  HCC827  cells,  H1650  cells 
retained  high  levels  of  p-Akt  despite  inhibition  of  EGFR  (Fig.  1C). 
By  contrast,  erlotinib-mediated  inhibition  of  known  signal  trans¬ 
ducers  of  the  EGFR  such  as  ErbB3,  STAT3,  and  ERK  was  similar  to 
the  levels  observed  in  HCC827,  consistent  with  the  uncoupling  of 
mutant  EGFR  from  downstream  survival  signaling  at  the  level  of 
Akt  (Fig.  1C). 


We  speculated  that  chromosomal  aberrations  might  be  causa- 
tively  involved  in  this  phenotype  and  sought  for  chromosomal 
regions  displaying  differential  copy  numbers  between  H1650  cells 
and  the  EGFR -mutant  and  erlotinib-sensitive  cell  lines.  We 
identified  13  H 1650-specific  chromosomal  loci  harboring  nine 
known  genes,  including  a  chromosomal  region  affected  by 
homozygous  deletion  3'  to  the  locus  containing  the  tumor 
suppressor  gene  PTEN  (Fig.  2 A;  ref.  7).  Furthermore,  when 
analyzing  the  transcription  of  IGFBP2,  a  marker  predictive  of 
PTEN  loss  in  glioblastoma  (8),  H1650  was  the  highest  scoring  line 
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in  our  panel  (data  not  shown).  PTEN  counteracts  Akt  activation  by 
dephosphorylating  phosphatidylinositol-3, 4, 5-triphosphate  (PIP3), 
the  product  of  class  I  phosphoinositide-3-kinases  (7,  9).  Because 
PTEN  loss  has  been  shown  to  be  involved  in  EGFR  inhibitor 
resistance  in  some  tumor  cell  lines  (10,  11)  and  in  glioblastoma 
patients  (12),  we  reasoned  that  PTEN  loss  might  also  be  involved  in 
the  EGFR-independent  phenotype  of  H1650.  Furthermore,  lack  of 
PTEN  protein  expression  has  previously  been  speculated  to  be 
involved  in  erlotinib  resistance  in  H1650  cells  (13,  14). 

To  determine  whether  loss  of  PTEN  protein  in  H1650  cells  (13, 
14)  might  be  caused  by  genomic  loss,  we  mapped  the  PTEN  locus 
by  quantitative  PCR.  Fine-mapping  followed  by  long-distance  PCR 
revealed  that  the  homozygous  deletion  (spanning  16.8  kb)  leads  to 
the  deletion  of  the  3'  part  of  exon  8  and  the  entire  exon  9  (Fig.  2 B). 
The  deletion  results  in  a  COOH-terminally  truncated  protein  that 
could  only  be  detected  using  antibodies  against  NH2-terminal 


epitopes  (Fig.  2 C).  Previous  functional  genetics  experiments  have 
shown  a  critical  role  of  the  COOH-terminal  part  of  PTEN  (15). 
Thus,  the  COOH-terminal  deletion  in  H1650  cells  might  be  causally 
involved  in  uncoupling  mutant  EGFR  from  downstream  Akt 
survival  signaling. 

We  next  analyzed  a  panel  of  140  primary  lung  adenocarcinomas 
(predominantly  Caucasian  patients),  annotated  for  copy  number 
alterations  and  mutations  in  623  genes,  for  the  presence  of  co¬ 
occurring  lesions  in  PTEN  and  EGFR  (16,  17).  We  found  co¬ 
occurrence  of  homozygous  deletion  of  PTEN  and  EGFR  mutation 
in  1  out  of  24  samples  with  EGFR  mutations  (Fig.  2D).  Thus, 
primary  resistance  of  EGFR-mutant  NSCLC  might,  in  rare  cases,  be 
due  to  homozygous  loss  of  PTEN.  Furthermore,  we  found 
hemizygous  loss  of  chromosome  10  to  be  significantly  enriched 
in  EGFR -mutant  patients  in  the  cohort  of  140  primary  samples 
(P  =  0.012;  data  not  shown).  Loss  of  the  other  allele  by  mutation 
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Figure  2.  Genomic  characterization  of  PTEN  loss  in  HI 650  cells.  A,  list  of  genes  affected  by  differential  lesions  between  HI 650  cells  and  EGFR-mutant  and 
erlotinib-sensitive  cell  lines.  B,  left,  screenshot  showing  chromosomal  aberrations  at  chromosome  10  (Integrative  Genomics  Viewer;  http://www.broad.mit.edu/igv/)  of  all 
EGFR-mutant  cells.  Middle,  3'-region  mapping  of  PTEN  using  quantitative  PCR  reveals  a  homozygous  deletion  deleting  parts  of  exon  8  and  the  entire  exon  9. 
Right,  the  sequence  bridging  the  breakpoint.  C,  left,  PTEN  protein  status  determined  using  immunoblotting  in  different  NSCLC  cell  lines.  Right,  NH2-terminal  and 
COOH-terminal  PTEN  detection  by  immunoblotting.  LNCAP  cells,  known  to  express  a  truncated  version  of  PTEN,  served  as  controls.  D,  analysis  of  EGFR  mutations 
(red)  and  homozygous  deletions  of  PTEN  (black)  and  PTEN  mutations  (blue)  in  140  lung  cancer  biopsy  specimens. 
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Figure  3.  Eriotinib  resistance  in  EGFR-mutated  NSCLC  with  PTEN  loss.  A,  left,  in  H1650PTEN  cells,  PTEN  levels  were  determined  by  immunoblotting.  Right,  levels  of 
phospho-EGFR  and  phospho-AKT  were  assessed  by  immunoblotting  in  HI 650,  H1650MOCK,  and  H1650PTEN  cells  treated  with  eriotinib.  B,  left,  in  PC9p^ENkd  cells, 
PTEN  levels  were  determined  by  immunoblotting.  Right,  levels  of  phospho-EGFR  and  phospho-AKT  were  assessed  in  PC9,  pQ9CONTkdj  ancj  PC9PTENkd  ce||s  treated 
with  eriotinib.  C,  left,  percentage  of  apoptotic  cells  (in  %,  analyzed  by  measuring  the  fraction  of  cells  positive  for  Annexin  V  and/or  propidium  iodide  by  flow  cytometry) 
after  treatment  with  either  eriotinib  (1  )amol/L)  or  control.  Right,  cumulative  histograms  of  apoptosis  induction.  D,  levels  of  Bim  (EL,  extra  long;  L,  long;  S,  short), 
phospho-ERK,  phospho-pAKT,  and  actin  were  measured  after  serum  starvation  (serum  starvation  “+”),  EGF  stimulation  (EGF  “+”),  or  treatment  with  eriotinib  (1  |amol/L 
eriotinib  “+”)  for  24  h. 
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might  thus  confer  acquired  resistance  in  patients  initially 
responding  to  EGFR  inhibition.  This  notion  is  also  supported  by 
a  previous  study  reporting  favorable  survival  of  EGFR -mutant 
patients  with  high  expression  of  PTEN  (18). 

We  reconstituted  wild-type  PTEN  in  H1650  cells  by  stable 
retroviral  expression  (Fig.  3A).  Reconstitution  of  PTEN  restored 
coupling  of  the  EGFR  signal  to  downstream  Akt  signaling  as 
evidenced  by  dephosphorylation  of  both  EGFR  and  Akt  upon 
erlotinib  treatment  (Fig.  3 A).  Cellular  proliferation  of  H1650PTEN 
cells  treated  with  erlotinib  was  virtually  identical  to  that  seen  in  the 
parental  cells  (data  not  shown)  but  combinatorial  treatment  of 
H1650  cells  with  erlotinib  and  an  AKT  inhibitor  led  to  a  reduction 
of  viability  when  compared  with  cells  treated  with  erlotinib  alone 
(Supplementary  Fig.  SI).  However,  when  analyzing  the  fraction  of 
cells  undergoing  apoptosis  upon  treatment  with  erlotinib,  we 
observed  an  increase  of  apoptotic  H1650PTEN  cells  when  compared 
with  the  parental  and  the  mock-transduced  cells  (Fig.  3C).  Thus, 
PTEN  reconstitution  increases  the  susceptibility  to  erlotinib- 
induced  apoptosis  in  H1650  cells. 

We  next  silenced  PTEN  in  EGFR -mutant  and  erlotinib-sensitive 
PC9  cells  by  lentiviral  short  hairpin  RNAs  (Fig.  3 B).  Similar  to  our 
observation  in  the  parental  H1650  cells,  PTEN  loss  in  PC9  cells 
(PC9PTENkd)  induced  the  uncoupling  of  EGFR  and  downstream 
Akt  signaling  as  shown  by  continuous  Akt  phosphorylation  under 


erlotinib  treatment  (Fig.  3 B).  Again,  recapitulating  our  observations 
in  H1650  cells,  silencing  of  PTEN  expression  in  PC9  cells  led  to  a 
significant  decrease  in  the  fraction  of  apoptotic  cells  when  treated 
with  erlotinib  (Fig.  3C).  Induction  of  apoptosis  in  both  PTEN- 
proficient  and  PTEN- deficient  cells  was  paralleled  by  activation  of 
the  proapoptotic  protein  Bim,  recently  shown  to  play  a  key  role  in 
erlotinib-induced  apoptosis  in  EGFR -mutant  NSCLC  (refs.  19,  20; 
Fig.  3 D).  Thus,  the  differential  induction  of  apoptosis  is  not 
mediated  through  modulation  of  Bim  levels.  Interestingly,  in 
PC9PTENkd  cell  lines,  we  observed  the  activation  of  Erk  under 
steady-state  and  serum-starved  conditions,  whereas  PTEN- profi¬ 
cient  cells  hardly  showed  Erk  activity  (Fig.  3 D).  Thus,  PTEN  loss 
partially  uncouples  EGFR  signaling  from  downstream  Akt  survival 
signaling,  activates  ERK,  and  contributes  to  EGFR  inhibitor 
resistance. 

While  analyzing  the  activity  of  Akt  in  PTEN -deficient  HI  650 
and  PC9PTENkd  EGFR- mutant  cells,  we  observed  an  increase  in 
phospho-EGFR  when  compared  with  PTEN- proficient  cells.  In 
PC9PTENkd  cells,  complete  deactivation  of  EGFR  was  achieved  at 
750  nmol/L  of  erlotinib,  whereas  in  parental  and  control  PC9  cells, 
250  nmol/L  of  erlotinib  was  sufficient  to  fully  dephosphorylate 
the  receptor  (Fig.  44).  Thus,  the  resistance  phenotype  observed 
in  PTEN -deficient  H1650  cells  may  be  partially  explained  by  the 
prolonged  activation  of  EGFR  under  treatment  with  EGFR  tyrosine 


Figure  4.  PTEN  loss  activates  EGFR.  A,  phospho-EGFR  was 
detected  by  immunoblotting  after  short  exposure  ( SE )  and 
long  exposure  (LE)  in  PC9,  pc9CONTkd,  and  PC9PTENkd  cells. 
Actin  levels  served  as  a  loading  control.  B,  left,  levels  of 
phospho-EGFR  of  PG9PTENkd  and  PC9  cells  treated  with 
erlotinib  were  determined  (+/-  EGF)  under  serum  starvation. 
Right,  apoptosis  (%)  after  erlotinib  treatment  (0.5  |amol/L)  in  the 
given  cells.  C,  left,  phospho-EGFR  and  phospho-AKT  in  H3255 
and  H3255MyrAKT  cells  were  assessed  by  immunoblotting. 
Right,  the  fraction  of  apoptotic  cells  (in  %)  in  the  given  cells. 
D,  a  simplified  model  explaining  our  observations:  in 
EGFR-mutant  cells,  EGFR  is  the  sole  input  for  production  of 
PIP3.  Inhibiting  EGFR  dramatically  reduces  the  input  into  PIP3 
production.  Therefore,  the  lack  of  negative  regulation  of  PIP3 
production  by  loss  of  PTEN  is  limited. 
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kinase  inhibitors.  To  test  whether  PTEN  loss-induced  EGFR 
activation  may  be  mimicked  by  stimulation  of  EGFR  in  PTEN- 
proficient  PC9  cells,  we  treated  parental  PC9  cells  with  a 
combination  of  erlotinib  and  EGF  (Fig.  4 B).  We  observed  an 
induction  of  phospho-EGFR  by  dual  EGF  stimulation  and  EGFR 
inhibition  resembling  the  situation  in  ETE/V-deficient  cells 
(Fig.  4 B).  Confirming  the  functional  relevance  of  PTEN  loss- 
induced  EGFR  activation,  this  treatment  also  led  to  a  reduction  of 
the  fraction  of  apoptotic  cells  (Fig.  4 B). 

Finally,  we  asked  whether  survival  signaling  activated  by  loss  of 
PTEN  is  equivalent  to  immediate  activation  of  Akt.  We  introduced 
a  constitutively  active  allele  of  Akt  (MyrAkT)  into  EGFR -mutant 
and  erlotinib-sensitive  H3255  cells.  As  expected,  levels  of  phospho- 
Akt  but  not  of  phospho-EGFR  levels  remained  elevated  in 
H3255MyrAKT  cells  under  erlotinib  treatment  (Fig.  4C).  Further¬ 
more,  this  pronounced  Akt  activity  was  associated  with  erlotinib 
resistance  (P  <  0.0005)  of  H3255MyrAKT  cells  when  measuring  ap¬ 
optosis  (Fig.  4 C).  Thus,  immediate  and  constitutive  activation  of 
Akt  is  more  effective  than  PTEN  loss  to  induce  erlotinib  resistance 
in  EGFR -mutant  NSCLC  cells. 

Others  have  recently  shown  that  PTEN  loss  leads  to  robust  EGFR 
inhibitor  resistance  in  cells  lacking  EGFR  mutations  (10,  11).  Our 
findings  in  EGE7?-mutant  NSCLC  cells  differ  from  these  observa¬ 
tions,  as  the  phenotype  elicited  by  PTEN  loss  was  less  dominant. 
This  discrepancy  may  be  explained  by  the  fact  that  EGFR -mutant 
NSCLC  cells  are  exclusively  dependent  on  EGFR  signaling  for  their 
survival.  Thus,  erlotinib-mediated  inhibition  of  EGFR  as  the  sole 
input  of  PIP3  production  may  only  partially  be  rescued  by  PTEN 
loss  (Fig.  AD). 


In  summary,  we  have  shown  that  in-depth  genomic  and 
phenotypic  analyses  of  large  cell  line  collections  can  be  applied 
to  identify  a  novel  cell  biology  phenotype.  Here,  computational 
genomic  analyses  implied  homozygous  deletion  of  PTEN  as  a 
candidate  for  EGFR  inhibitor  resistance.  Functional  studies 
revealed  that  PTEN  loss  induces  a  significant  reduction  in 
apoptosis  sensitivity  in  EGFR -mutant  cells  by  activation  of  Akt 
and  EGFR.  We  speculate  that  activation  of  Erk  in  EFE/V-deficient 
cells  (Fig.  3 D)  may  lead  to  transcriptional  up-regulation  of  EGFR 
ligands,  such  as  amphiregulin  (21).  Moreover,  PTEN  loss  and  EGFR 
mutation  co-occurred  in  1  out  of  24  EGFR -mutant  patients  in  a 
genomic  analysis  of  140  lung  adenocarcinomas,  thus  confirming 
the  clinical  relevance  of  our  findings.  Thus,  PTEN  loss  may 
represent  an  additional  mechanism  of  initial  or  acquired  resistance 
to  erlotinib-induced  apoptosis  in  ET/E7?-mutant  NSCLC. 
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Somatic  genetic  alterations  in  cancers  have  been  linked  with  response  to  targeted  therapeutics  by  creation 
of  specific  dependency  on  activated  oncogenic  signaling  pathways.  However,  no  tools  currently  exist  to  sys¬ 
tematically  connect  such  genetic  lesions  to  therapeutic  vulnerability.  We  have  therefore  developed  a  genom¬ 
ics  approach  to  identify  lesions  associated  with  therapeutically  relevant  oncogene  dependency.  Using  inte¬ 
grated  genomic  profiling,  we  have  demonstrated  that  the  genomes  of  a  large  panel  of  human  non-small  cell 
lung  cancer  (NSCLC)  cell  lines  are  highly  representative  of  those  of  primary  NSCLC  tumors.  Using  cell-based 
compound  screening  coupled  with  diverse  computational  approaches  to  integrate  orthogonal  genomic  and 
biochemical  data  sets,  we  identified  molecular  and  genomic  predictors  of  therapeutic  response  to  clinically 
relevant  compounds.  Using  this  approach,  we  showed  that  v-Ki-ras2  Kirsten  rat  sarcoma  viral  oncogene  homo¬ 
log  (KRAS)  mutations  confer  enhanced  Hsp90  dependency  and  validated  this  finding  in  mice  with  KRAS- 
driven  lung  adenocarcinoma,  as  these  mice  exhibited  dramatic  tumor  regression  when  treated  with  an  Hsp90 
inhibitor.  In  addition,  we  found  that  cells  with  copy  number  enhancement  of  v-abl  Abelson  murine  leukemia 
viral  oncogene  homolog  2  (ABL2)  and  ephrin  receptor  kinase  and  v-src  sarcoma  (Schmidt- Ruppin  A-2)  viral 
oncogene  homolog  (avian)  (SRC)  kinase  family  genes  were  exquisitely  sensitive  to  treatment  with  the  SRC/ABL 
inhibitor  dasatinib,  both  in  vitro  and  when  it  xenografted  into  mice.  Thus,  genomically  annotated  cell-line 
collections  may  help  translate  cancer  genomics  information  into  clinical  practice  by  defining  critical  pathway 
dependencies  amenable  to  therapeutic  inhibition. 


Introduction 

The  dynamics  of  ongoing  efforts  to  fully  annotate  the  genomes 
of  all  major  cancer  types  are  reminiscent  of  those  of  the  Human 
Genome  Project.  The  analysis  of  somatic  gene  copy  number 
alterations  and  gene  mutations  associated  with  cancer  (both 


here  referred  to  as  lesions)  will  thus  provide  the  genetic  landscape 
of  human  cancer  in  the  near  future.  The  medical  implications 
of  these  endeavors  are  exemplified  by  the  success  of  molecularly 
targeted  cancer  therapeutics  in  genetically  defined  tumors:  the 
ERBB2/Her2-targeted  (where  ERBB2  is  defined  as  v-erb  b2  erytb- 
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roblastic  leukemia  viral  oncogene  homolog  2,  neuro/ glioblastoma-derived 
oncogene  homolog  [avian])  antibody  trastuzumab  shrinks  tumors 
in  women  with  ERBB2- amplified  breast  cancer  (1);  the  ABL/KIT/ 
PDGFR  (where  ABL  is  defined  as  v-abl  Abelson  murine  leukemia 
viral  oncogene  homolog  and  KIT  is  defined  as  v-kit  Hardy-Zuckerman 
4  feline  sarcoma  viral  oncogene  homolog)  inhibitor  imatinib  induces 
responses  in  patients  with  chronic  myeloid  leukemia  carrying  the 
BCR/ ABL  (where  BCR  is  defined  as  breakpoint  cluster  region)  trans¬ 
location  (2,  3)  as  well  as  in  patients  with  gastrointestinal  stromal 
tumors  and  melanomas  bearing  mutations  in  KIT  (4)  or  PDGFRA 
(5);  and  finally,  EGFR- mutant  lung  tumors  are  highly  sensitive  to 
the  EGFR  inhibitors  gefitinib  and  erlotinib  (6-8).  In  most  cases, 
such  discoveries  were  made  after  the  completion  of  clinical  trials; 
as  yet  no  robust  mechanism  currently  exists  that  permits  system¬ 
atic  identification  of  lesions  causing  therapeutically  relevant  onco¬ 
gene  dependency  prior  to  initiation  of  such  clinical  trials. 

The  use  of  cancer  cell  lines  allows  systematic  perturbation 
experiments  in  vitro,  yet  the  validity  and  clinical  interpretability  of 
these  widely  used  models  have  been  questioned.  In  some  notable 
instances,  pathways  may  lose  function  when  grown  in  culture  (9). 
In  addition,  cell  lines  are  frequently  thought  to  be  genomically  dis¬ 
arrayed  and  unstable  and  therefore  likely  poorly  representative  of 
primary  tumors.  Furthermore,  the  genetic  diversity  of  histopatho- 
logically  defined  classes  of  tumors  is  often  substantial,  e.g.,  the 
clinical  tumor  entity  non-small  cell  lung  cancer  (NSCLC)  com¬ 
prises  EGFR-  and  KRAS- mutant  (where  KRAS  is  defined  as  v-Ki-ras2 
Kirsten  rat  sarcoma  viral  oncogene  homolog)  lung  adenocarcinomas  as 
well  as  KRAS-mutant  squamous-cell  lung  cancers.  Thus,  any  rep¬ 
resentative  preclinical  model  would  need  to  capture  the  nature  of 
lesions  of  primary  tumors  as  well  as  their  distribution  in  the  his- 
topathologically  defined  cohort. 

Recent  reports  have  credentialed  the  use  of  cancer  cell  lines  in 
preclinical  drug  target  validation  experiments  (10-13).  Building  on 
the  foundation  of  these  studies,  we  have  now  established  a  cell-line 
collection  that  enables  systematic  prediction  of  drug  activity  using 
global  profiles  of  genetic  lesions  in  NSCLC.  Given  the  genomic 
diversity  of  a  particular  cancer  type,  we  reasoned  that  in-depth  pre¬ 
clinical  analyses  of  activity  of  cancer  therapeutics  in  tumor  cells 
would  require  both  thorough  genomic  analysis  of  a  large  cell-line 
collection  of  a  single  tumor  entity  and  high-throughput  cell-line 
profiling,  followed  by  genomic  prediction  of  compound  activity. 

We  set  out  to  systematically  annotate  the  genomes  of  a  large 
panel  of  NSCLC  cell  lines  in  order  to  determine  whether  such  a 
collection  reflects  the  genetic  diversity  of  primary  NSCLC  tumors. 
We  further  determined  the  phenotypic  validity  of  this  collection 
and  analyzed  drug  activity  as  a  function  of  genomic  lesions  in  a 
systematic  fashion.  Finally,  we  confirmed  the  validity  of  our  pre¬ 
dictors  in  vitro  and  in  lung  cancer  mouse  models.  Such  comple¬ 
mentary  efforts  may  provide  a  framework  for  future  preclinical 
analyses  of  compound  activity,  taking  into  account  the  multitude 
of  genetic  lesions  in  histopathologically  defined  cancer  types. 

Results 

A  genomically  validated  collection  of  NSCLC  cell  lines.  Eighty-four 
NSCLC  cell  lines  were  collected  from  various  sources  (Supple¬ 
mental  Table  1;  supplemental  material  available  online  with  this 
article;  doi:10.1172/JCI37127DSl)  and  formed  the  basis  for  all 
subsequent  experiments.  Cell  lines  were  derived  from  tumors  rep¬ 
resenting  all  major  subtypes  of  NSCLC  tumors,  including  adeno¬ 
carcinoma,  squamous-cell  carcinoma,  and  large-cell  carcinoma. 


The  genomic  landscape  of  these  cell  lines  was  characterized  by 
analyzing  gene  copy  number  alterations  using  high-resolution 
SNP  arrays  (250K  Styl).  We  used  the  statistical  algorithm  Genomic 
Identification  of  Significant  Targets  in  Cancer  (GISTIC)  to  distin¬ 
guish  biologically  relevant  lesions  from  background  noise  (14).  The 
application  of  GISTIC  revealed  16  regions  of  recurrent,  high-level 
copy  number  gain  (inferred  copy  number  >2.14)  and  20  regions  of 
recurrent  copy  number  loss  (inferred  copy  number  <  1.86)  (Supple¬ 
mental  Tables  2  and  3).  Overall,  we  identified  focal  peaks  with  a 
median  width  of  1.45  Mb  (median  13.5  genes/region)  for  amplifi¬ 
cations  and  0.45  Mb  for  deletions  (median  1  gene/region).  These 
regions  contained  lesions  known  to  occur  in  NSCLC  (e.g.,  deletion 
of  LRP1B  [2q],  FHIT  [3p],  CDKN2A  [9p];  amplification  of MYC  [8q], 
EGFR  [7p]  and  ERBB2  [17q];  Figure  1A  and  Supplemental  Table  2). 
Furthermore,  within  broad  regions  of  copy  number  gain,  we  also 
identified  amplification  ofTITFl  (14q)  and  TERT  (5p)  (Figure  1A 
and  Supplemental  Table  2),  recently  identified  by  large-scale 
genomic  profiling  of  primary  lung  adenocarcinomas  (15-17). 

Analysis  of  homozygous  deletions  as  well  as  loss  of  heterozygos¬ 
ity  (LOH)  is  typically  hampered  by  admixture  of  nontumoral  cells 
in  primary  tumors.  The  purity  of  cell-line  DNA  permitted  identifi¬ 
cation  of  previously  unknown  homozygous  deletions  and  regions 
of  LOH,  including  LOH  events  resulting  from  uniparental  disomy 
(e.g.,  copy-neutral  events)  (Supplemental  Table  4).  In  this  analysis, 
known  genes  such  as  MTAP  (9p)  and  LATS2  (13q)  were  altered  by 
homozygous  deletions  (18,  19)  and  we  found  what  we  believe  are 
novel  homozygous  deletion  of  genes  such  as  TUBA2  (Supplemental 
Table  4).  Of  note,  most  of  these  regions  could  also  be  identified  in 
primary  NSCLC  tumors  as  deleted  ( 15);  however,  inferred  copy  num¬ 
bers  only  inconstantly  showed  LOH  or  homozygous  deletions,  indi¬ 
cating  admixture  of  normal  diploid  DNA  (Supplemental  Table  4). 
Thus,  while  a  recent  large-scale  cancer  profiling  study  (15)  enabled 
insight  into  the  genomic  landscape  of  lung  adenocarcinoma,  the  use 
of  pure  populations  of  tumor  cells  further  afforded  discovery  of  pre¬ 
viously  unrecognized  regions  of  homozygous  deletions  and  LOH. 

We  next  compared  the  profile  of  significant  amplifications  and 
deletions  in  this  cell-line  collection  with  that  of  a  set  of  371  pri¬ 
mary  lung  adenocarcinomas  (15).  This  comparison  revealed  a  strik¬ 
ing  similarity  between  the  2  data  sets  (Figure  1  A)  but  not  between 
NSCLC  cell  lines  and  gliomas  or  melanomas  (Supplemental  Fig¬ 
ure  1,  A  and  B).  A  quantitative  analysis  of  similarity  by  comput¬ 
ing  correlations  of  the  false  discovery  rate  ( q  value)  confirmed  the 
similarity  of  primary  lung  cancer  and  lung  cancer  cell  lines  (r  =  0.77) 
and  the  lack  of  similarity  of  lung  cancer  cell  lines  and  primary  glio¬ 
mas  (14)  (r  =  0.44),  melanoma  cell  lines  (11)  (r  =  0.44),  or  ovarian 
tumors  (r  =  0.38;  Supplemental  Figure  1C).  As  a  control,  repeated 
random  splitting  of  the  lung  cancer  cell-line  data  and  computation 
of  internal  similarity  resulted  in  correlation  coefficients  between 
0.82  and  0.86,  whereas  we  found  no  correlation  with  normal  tissue 
(r  =  0.0195;  Supplemental  Figure  1C).  These  results  demonstrate 
that  the  genomic  copy  number  landscape  of  NSCLC  cell  lines 
reflects  that  of  primary  NSCLC  tumors,  while  tumors  or  cell  lines 
of  other  lineages  show  a  much  lower  degree  of  similarity  (20,  21). 
Furthermore,  the  distribution  of  oncogene  mutations  in  the  cell 
lines  (Supplemental  Table  5)  was  similar  to  that  in  primary  NSCLC 
tumors,  with  a  high  prevalence  of  mutations  in  the  KRAS  and  EGFR 
genes  (22-25)  and  rare  occurrence  of  phosphoinositide-3-kinase, 
catalytic,  a  polypeptide  (. PIK3CA )  and  v-raf  murine  sarcoma  viral 
oncogene  homolog  B1  (BRAE)  mutations  (Figure  IB).  These  results 
further  validate  our  cell-line  collection  on  a  genetic  level. 
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Figure  1 

Genomic  validation  of  84  NSCLC  cell  lines.  (A)  Chromosomal  copy  number  changes  of  NSCLC  cell  lines  are  plotted  against  those  of  371  primary 
NSCLC  tumors.  The  q  values  (false  discovery  rates)  for  each  alteration  (x  axis)  are  plotted  at  each  genome  position  (y  axis).  Left  panel  shows 
chromosomal  losses  (cell  lines,  purple;  primary  tumors,  dark  blue);  right  panel  shows  chromosomal  gains  (cell  lines,  red;  primary  tumors,  blue). 
Genomic  positions  corresponding  to  even-numbered  chromosomes  are  shaded;  dotted  lines  indicate  centromeres;  green  lines,  q  value  cutoff 
(0.25)  for  significance.  Genes  represent  known  targets  of  mutation  in  lung  adenocarcinomas.  Putative  targets  near  peaks  are  given  in  paren¬ 
theses.  Genes  identified  by  GISTIC  using  stringent  filtering  criteria  for  peak  border  detection  are  marked  by  asterisks.  (B)  Oncogene  mutations 
present  in  NSCLC  cell  lines  (black  bars)  are  plotted  according  to  their  relative  frequencies  in  comparison  with  primary  lung  tumors  (gray  bars) 
(22-25).  (C)  Transcriptional  profiles  of  primary  renal  cell  carcinomas  (orange)  and  corresponding  cell  lines  (red);  primary  lung  tumors  (dark 
green)  and  lung  cancer  cell  lines  (light  green);  primary  lymphomas  (blue)  and  lymphoma  cell  lines  (purple)  were  analyzed  by  hierarchical  cluster¬ 
ing.  To  reduce  noise,  probe  sets  were  filtered  prior  to  clustering  (coefficient  of  variation  from  1 .0  through  -1 0.0,  present  call  rate,  20%;  absolute 
expression  greater  than  100  in  more  than  20%  of  samples). 


The  availability  of  both  copy  number  alteration  and  oncogene 
mutation  data  of  the  NSCLC  cell  lines  enabled  us  to  analyze  the 
interactions  of  both  types  of  lesions  (Supplemental  Figure  2).  Hier¬ 
archical  clustering  of  lesions  robustly  grouped  both  mutations  and 
amplification  ofEGFRin  1  subcluster  (ratio  Q  of  observed  vs.  expect¬ 
ed  cooccurrence:  Q  =  4.38,  P  =  0.001),  while  KRAS  mutations  consis¬ 
tently  grouped  in  a  distinct  cluster.  These  findings  corroborate  prior 
observations  in  vivo  in  which  mutations  in  KRAS  and  EGFR  were 
mutually  exclusive  while  EGFR  mutation  and  EGFR  amplification 


frequently  cooccurred  (23,  26,  27).  Moreover,  these  results  suggest 
that  these  mutations  influence  the  particular  signature  of  genomic 
alterations  in  the  affected  tumors.  Finally,  in  unsupervised  hierar¬ 
chical  cluster  analyses  of  gene  expression  data,  primary  lung  cancer 
specimens  (28)  and  lung  cancer  cell  lines  shared  1  cluster  (Figure 
1C),  while  renal  cell  carcinomas  (29)  and  lymphomas  (30)  as  well  as 
the  corresponding  cell  lines  clustered  in  a  separate  group. 

In  summary,  in-depth  comparative  analysis  of  orthogonal 
genomic  data  sets  of  a  large  panel  of  NSCLC  cell  lines  and  primary 
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Figure  2 

Robustness  of  phenotypic  properties  of  EGFR-mutant  lung  cancer  cells  in  vivo.  (A)  The  first  2  principal  components  (PCI  and  PC2)  distinguish 
cell  lines  with  mutated  (mut)  EGFR  (red  dots)  and  WT  EGFR  (blue  dots)  (n  =  54).  (B)  The  signature  (fold  change  greater  than  2;  absolute  differ¬ 
ence,  100;  P  <  0.01)  of  EGFR-mutant  cell  lines  (n  =  8/54)  was  used  for  hierarchical  clustering  of  123  primary  adenocarcinomas  (35)  annotated 
for  the  presence  (EGFRmut,  red  bars)  or  absence  (EGFRWT,  dark  blue  bars)  of  EGFR  mutations.  (C)  Probability  of  survival  was  estimated  for  all 
123  primary  adenocarcinomas  with  known  EGFR  mutation  status  following  grouping  according  to  relative  abundance  of  337  RNA  transcripts 
identified  as  differentially  expressed  between  EGFR-mutant  and  EGFR  WT  cell  lines.  EGFR-mutant  tumors  (n  =  13)  were  excluded  from  survival 
analyses.  Survival  probabilities  are  depicted  as  Kaplan-Meier  survival  estimate  curves.  (D)  The  same  analysis  was  performed  using  86  lung 
tumors  from  Beer  et  al.  (37)  with  available  survival  data.  Two  groups  were  formed  according  to  relative  abundance  of  the  EGFR  mutation-specific 
genes,  and  survival  analysis  was  performed  as  in  D.  (E)  The  association  between  presence  (amplification,  green;  mutation,  red;  deletion,  yellow) 
of  genetic  lesions  identified  in  the  cell  lines  and  sensitivity  of  the  respective  cell  lines  to  treatment  with  the  EGFR  inhibitor  erlotinib  was  analyzed 
by  Welch’s  t  test  and  Fisher’s  exact  test.  Significant  lesions  are  marked  by  gray  (F  <  0.05)  or  black  (P  <  0.0001)  boxes. 


tumors  demonstrates  that  these  cell  lines  reflect  the  genetic  and 
transcriptional  landscape  of  primary  NSCLC  tumors. 

EGFR  mutations  define  phenotypic  properties  of  lung  tumors  in  vitro 
and  in  vivo.  Activated  oncogenes  typically  cause  a  transcriptional 
signature  that  can  be  used  to  identify  tumors  carrying  such  onco¬ 
genes  (31,  32).  However,  we  consistently  failed  to  identify  a  gene 
expression  signature  characteristic  of  EGFR- mutant  tumors  (33, 
34)  using  a  gene  expression  data  set  of  123  primary  lung  adenocar¬ 
cinomas  (35)  annotated  for  mutations  in  EGFR  (data  not  shown). 
We  therefore  reasoned  that  the  cellular  purity  of  our  cell  lines 
( n  =  54  analyzed  on  U133A)  might  enable  the  determination  of 
such  a  signature  and  the  application  of  this  signature  in  primary 
tumors.  We  applied  principal  component  analyses  on  the  variable 
genes  and  found  a  remarkable  grouping  of  all  EGFR  mutated  cell 
lines  ( n  =  8/54),  with  a  significant  dissociation  already  in  the  first 
principal  component  (Welch’s  t  test  on  the  distribution  of  eigenval¬ 
ues:  P  =  0.0005)  contributing  14.5%  to  the  overall  variance  (Figure 
2A).  Similar  results  were  obtained  by  hierarchical  clustering  (data 
not  shown).  Using  genes  differentially  expressed  in  EGFR- mutant 
cell  lines  (including  T790M)  as  a  surrogate  feature  (Supplemen¬ 
tal  Table  6),  all  of  the  EGFR-mutant  primary  tumors  (35)  were 
grouped  in  a  distinct  cluster  (P  =  0.00001)  when  performing  hier¬ 
archical  clustering  (Figure  2B).  This  result  was  also  recapitulated 
when  selecting  genes  differentially  expressed  in  erlotinib-sensitive 


(GI50  <0.1  pM,  n  =  5/54  vs.  GI50  >  2  pM,  n  =  45,  where  GI50  indicates 
half-maximal  growth  inhibitory  concentration)  cell  lines  (Supple¬ 
mental  Figure  3A).  Furthermore,  patients  with  tumors  express¬ 
ing  the  signature  of  EGFR  mutated  cell  lines  had  better  overall 
survival  than  those  whose  tumors  did  not  (Figure  2C)  (36).  The 
power  of  our  EGERmut  signature  to  predict  survival  was  confirmed, 
employing  the  data  published  by  Beer  and  colleagues  (Figure  2D) 
(37).  This  effect  was  even  observed  when  excluding  EGFR- mutant 
tumors  ( n  =  13)  from  the  analysis  (Figure  2C).  Thus,  expression 
signatures  extracted  in  vitro  can  be  used  to  identify  biologically 
diverse  tumors  in  vivo  (38). 

Others  have  recently  characterized  a  transcriptional  signature 
of  EGFR-mutant  NSCLC  using  a  small  set  of  cell  lines  (39).  How¬ 
ever  when  analyzing  primary  lung  adenocarcinomas  with  the  sig¬ 
nature  described  by  Choi  et  al.,  EGFR-mutant  samples  were  ran¬ 
domly  distributed  across  the  data  set  (Supplemental  Figure  3B). 
This  finding  further  highlights  the  importance  of  using  large 
cell-line  collections  in  order  to  represent  the  overall  genomic 
diversity  of  primary  tumors. 

Recent  studies  have  linked  the  presence  of  EGER  mutations  in 
lung  adenocarcinomas  to  clinical  response  to  the  EGFR  inhibitors 
erlotinib  and  gefitinib  (6-8).  However,  retrospective  studies  aimed 
at  determining  predictive  markers  for  EGFR  inhibition  yielded 
heterogeneous  results,  implicating  EGER  mutations  and/or 
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Figure  3 

Sensitivity  profiles  of  compounds 
determined  by  high-throughput  cell¬ 
line  screening.  GI5o  values  (y  axes) 
for  12  compounds  are  shown  for  the 
successfully  screened  (Supplemen¬ 
tal  Table  5)  cell  lines  (x  axes  show 
individual  cell  lines).  Due  to  the  fact 
that  rapamycin  typically  fails  to  com¬ 
pletely  abrogate  cellular  proliferation 
(79),  the  25%  inhibitory  concentra¬ 
tion  is  shown  for  these  compounds. 
Bars  represent  GI5o  (GI25  values 
in  the  case  of  rapamycin,  y  axis) 
throughout  the  cell-line  collection  (x 
axis)  ranked  according  to  sensitiv¬ 
ity.  The  maximum  concentration  is 
adapted  to  the  GI5o  value  (GI25  val¬ 
ues  in  the  case  of  rapamycin;  10  ^iM 
for  17-AAG,  erlotinib,  vandetanib, 
lapatinib,  sunitinib,  rapamycin,  and 
PD168393;  30  jiM  for  SU-11274, 
dasatinib,  and  purvalanol;  60  for 
VX-680;  90  for  U0126)  of  resis¬ 
tant  cell  lines.  The  5  most  sensitive 
cell  lines  for  each  compound  are 
highlighted  in  table  form. 


Cell  lines 


EGFR  amplifications  among  others  as  predictive  of  response  or 
patient  outcome  (40-42).  We  set  out  to  systematically  identify 
genetic  lesions  associated  with  sensitivity  to  erlotinib  by  includ¬ 
ing  all  global  lesion  data  from  our  genomics  analyses  rather  than 
focusing  on  EGFR- associated  lesions.  We  established  a  high- 
throughput  cell-line  screening  pipeline  that  enables  systematic 
chemical  perturbations  across  the  entire  cell-line  panel  followed 
by  automated  determination  of  GI50  values  (43)  to  determine 
erlotinib  sensitivity  for  all  cell  lines.  We  next  analyzed  the  distri¬ 
bution  of  genetic  lesions  in  erlotinib-sensitive  compared  with 
insensitive  cell  lines  (Supplemental  Tables  5  and  7)  and  further 
compared  the  mean  sensitivity  of  cell  lines  with  and  without 
the  respective  genetic  lesions.  In  both  analyses,  EGFR  muta¬ 
tions  were  the  best  single-lesion  predictor  of  erlotinib  sensitiv¬ 
ity  (Figure  2E  and  Supplemental  Table  7;  Fisher’s  exact  test; 
P  =  6.9  x  10'8).  Furthermore,  we  found  a  less  stringent  association 
with  amplification  of  EGFR  (Fisher’s  exact  test;  P  =  1.4  x  10'4); 
however,  only  EGFR  mutations  were  significant  predictors  of 
erlotinib  sensitivity  when  we  adjusted  for  multiple  hypothesis 
testing  using  Bonferroni’s  correction  (data  not  shown).  We  next 
used  signal-to-noise-based  feature  selection  combined  with  the 


K-nearest-neighbor  (KNN)  algorithm  (44,  45)  to  build  a  multile¬ 
sion  predictor  of  erlotinib  sensitivity.  The  best  performing  multile¬ 
sion  predictor  comprised  EGFR  mutations,  amplification  of  EGFR , 
and  lack  of  KRAS  mutations  (Figure  2E  and  Supplemental  Table 
7),  which  have  all  been  implicated  in  determining  responsiveness  of 
NSCLC  patients  to  EGFR  inhibitors  (6-8,  27,  40,  41,  46).  We  note 
that  in  our  data  set,  as  in  previously  published  reports  (6-8, 27, 40, 
41, 46),  EGFR  amplification  and  mutation  were  correlated,  whereas 
KRAS  mutations  were  mutually  exclusive  with  either  lesion  (Sup¬ 
plemental  Figure  2).  Thus,  our  observation  confirms  the  overall 
predominant  role  of  EGFR  mutations  in  predicting  responsiveness 
to  EGFR  inhibition,  and  it  provides  an  explanation  for  the  finding 
of  EGFR  amplification  as  being  predictive  of  response  as  well.  Our 
findings  also  corroborate  prior  clinical  reports  establishing  KRAS 
mutations  as  a  resistance  marker  for  EGFR  inhibition  therapy. 
Together,  these  results  imply  that  essential  transcriptional  and 
biological  phenotypes  of  the  original  tumors  are  preserved  in  the 
cell  lines,  a  necessary  requirement  for  application  of  such  collec¬ 
tions  as  proxies  in  preclinical  drug  target  validation  efforts. 

Differential  activity  of  compounds  in  clinical  development  in  NSCLC  cell 
lines.  Having  validated  the  cell-line  collection  by  demonstrating  its 
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Figure  4 

Hierarchical  clustering  of  compound  activity  uncovers  mutated  EGFR 
as  a  target  for  dasatinib  activity.  (A)  Displayed  is  a  hierarchical  cluster 
of  cell  lines  and  compounds,  clustered  according  to  GI5o  values  (red, 
high  compound  activity;  white,  low  compound  activity)  after  logarithmic 
transformation  and  normalization.  77  cells  reached  full  compound  cov¬ 
erage.  The  presence  (black)  or  absence  (gray)  of  selected  lesions  is 
annotated  in  the  right  panel.  (B)  Correlation  of  activity  of  compounds  to 
presence  of  amplifications  (red)  and  deletions  (blue)  as  well  as  onco¬ 
gene  mutations  (mut)  was  used  for  hierarchical  clustering.  Putative 
target  genes  inside  and  bordering  (*)  the  region  defined  by  GISTIC  are 
annotated.  (C)  Upper  panel  shows  that  binding  mode  of  erlotinib  (white) 
to  WT  EGFR.  Dasatinib  (pink)  is  modeled  into  the  ATP-binding  site  of 
EGFR.  The  2-amino-thiazole  forms  2  hydrogen  bonds  with  the  hinge 
region  of  the  kinase.  Lower  panel  shows  that  the  chloro-methyl-phenyl 
ring  of  dasatinib  binds  to  a  hydrophobic  pocket  near  the  gatekeeper 
Thr790  and  helix  C  and  will  clash  with  the  Met  side  chain  of  the  EGFR 
drug-resistance  mutation  T790M.  (D)  Upper  panel  shows  that  Ba/F3 
cells  ectopically  expressing  mutant  EGFR  with  (delEx19  +  T790M)  or 
without  (delEx19)  the  T790M  mutation  were  treated  for  12  hours  with 
the  either  dasatinib  or  erlotinib,  and  phospho-EGFR  and  EGFR  levels 
were  detected  by  immunoblotting.  Lower  panel  shows  that  the  same 
cells  were  treated  for  96  hours  with  either  dasatinib  or  erlotinib  and 
viability  was  assessed.  Growth  inhibition  relative  to  untreated  cells  (y 
axis)  is  shown  as  a  function  of  compound  concentrations. 

genomic  and  phenotypic  similarity  to  primary  NSCLC  tumors, 
we  reasoned  that  adding  complex  phenotypic  data  might  elicit 
additional  insights  into  the  impact  cancer  genotypes  have  on  cell 
biology  phenotypes.  In  our  initial  pilot  screening  experiment,  we 
profiled  all  cell  lines  against  erlotinib  and  subsequently  extended 
our  assay  to  1 1  additional  inhibitors  that  were  either  under  clini¬ 
cal  evaluation  or  showed  high  activity  in  preclinical  models;  these 
compounds  target  a  wide  spectrum  of  relevant  proteins  in  cancer 
(Supplemental  Figure  4).  We  treated  all  cell  lines  with  these  com¬ 
pounds  and  determined  GI50  values  (GI25  respectively;  Supplemen¬ 
tal  Table  5).  The  resulting  sensitivity  patterns  (Figure  3)  revealed 
that  while  some  of  the  compounds  exhibited  a  pronounced  cyto¬ 
toxic  activity  in  a  small  subset  of  cell  lines  (e.g.,  erlotinib,  vande- 
tanib,  VX-680),  others  were  active  in  most  of  the  cell  lines,  with  only 
a  minority  being  resistant  [e.g.,  17-(allylamino)-17-demethoxygel- 
danamycin  (17-AAG)].  Only  2  cell  lines  (<2%)  were  resistant  to  all 
of  the  compounds  (Supplemental  Table  5),  suggesting  that  most 
NSCLC  tumors  might  be  amenable  to  targeted  treatment.  Overall, 
these  observations  are  highly  reminiscent  of  patient  responses  in 
clinical  trials  in  which  limited  subsets  of  patients  experience  par¬ 
tial  and,  rarely,  complete  response  while  the  majority  of  patients 
exhibit  stable  disease,  no  change,  or  progression. 

Identification  of  relevant  compound  targets  by  similarity  profiling.  As 
an  initial  approach  to  identification  of  shared  targets  of  inhibi¬ 
tors,  we  performed  hierarchical  clustering  based  on  the  similarity 
of  sensitivity  profiles  (Figure  4A)  and  based  on  the  correlation 
between  sensitivity  and  genomic  lesion  profiles  (Figure  4B).  Erlo¬ 
tinib  and  vandetanib  exhibited  the  highest  degree  of  similarity, 
pointing  to  mutant  EGFR  as  the  critical  target  of  vandetanib  in 
NSCLC  tumor  cells  (Figure  4,  A  and  B)  (47,  48).  The  high  degree 
of  correlation  (r  =  0.91;  P  <  0.001)  of  cell-line  GI50  values  for  both 
compounds  as  well  as  structural  modeling  of  vandetanib  binding 
in  the  EGFR  kinase  domain,  which  revealed  a  binding  mode  iden¬ 
tical  to  that  of  erlotinib,  further  corroborate  this  notion  (Supple¬ 
mental  Figure  5A).  This  model  predicted  that  binding  of  both 
compounds  would  be  prevented  by  the  T790M  resistance  muta¬ 


tions  of  EGFR  (48-50);  accordingly,  murine  Ba/F3  cells  ectopically 
expressing  erlotinib-sensitizing  mutations  of  EGFR  together  with 
T790M  (5 1)  were  completely  resistant  to  erlotinib  and  vandetanib 
(Supplemental  Figure  5,  B  and  C). 

In  addition  to  the  ERBB2/EGFR  inhibitor  lapatinib,  vandetanib, 
and  the  irreversible  EGFR  inhibitor  PD  168393  (52),  the  SRC/ABL 
(where  SRC  is  defined  as  v-src  sarcoma  [Schmidt-Ruppin  A-2]  viral 
oncogene  homolog  [avian])  inhibitor  dasatinib  (53)  shared  a  clus¬ 
ter  with  the  EGFR  inhibitor  erlotinib,  although  at  a  much  lower 
potency  than  erlotinib  (Figure  4,  A  and  B).  Molecular  modeling  of 
dasatinib  binding  to  EGFR  predicted  a  binding  mode  similar  to 
that  of  erlotinib  (Figure  4C),  with  a  steric  clash  of  erlotinib  and 
dasatinib  with  the  erlotinib  resistance  mutation  T790M  (49, 50, 54, 
55)  (Figure  4C).  We  therefore  formally  validated  EGFR  as  a  relevant 
dasatinib  target  in  tumor  cells  by  showing  cytotoxicity  as  well  as 
EGFR  dephosphorylation  (56)  elicited  by  this  compound  in  Ba/F3 
cells  ectopically  expressing  mutant  EGFR  but  not  in  those  coex¬ 
pressing  the  T790M  resistance  allele  (Figure  4D).  Thus,  large-scale 
phenotypic  profiling  coupled  to  computational  prediction  formal¬ 
ly  validated  a  relevant  tumor-cell  target  of  an  FDA-approved  drug 
using  a  systematic  unbiased  approach.  It  is  noteworthy  that  a  trial 
of  dasatinib  in  patients  with  acquired  erlotinib  resistance  is  cur¬ 
rently  ongoing  (trial  ID:  NCT00570401;  http://clinicaltrials.gov/ 
ct2/ show/N CT005 7 040 1 ?term=N CT005 7 040 1 &rank=  1 ;  based  on 
previously  reported  biochemical  findings  (54)  and  our  results,  we 
predict  limited  clinical  activity  in  those  patients  in  whom  erlotinib 
resistance  is  due  to  the  EGFR  resistance  mutation  T790M. 

Supervised  learning  identifies  predictors  for  inhibitor  responsiveness.  We 
have  shown  that  hierarchical  clustering  can  identify  compounds 
with  overlapping  target  specificities  within  a  screening  experiment. 
We  now  set  out  to  extend  our  analyses  to  additional  computational 
approaches  to  predict  inhibitor  responsiveness  from  global  lesion 
data  in  a  systematic  fashion.  To  this  end,  we  applied  supervised 
learning  methods  as  we  did  for  erlotinib  (see  above).  Applying  this 
method,  we  identified  robust,  genetic  lesion-based  predictors  for 
the  majority  of  the  tested  compounds  (Supplemental  Table  7). 

U0126  is  a  MEK  inhibitor  that  also  showed  enhanced  activity 
in  a  subset  of  the  lung  cancer  cell-line  collection.  Here,  the  super¬ 
vised  approach  identified  chromosomal  gains  of  lq21.3  affect¬ 
ing  the  genes  ARNT  and  RAB13  as  being  robustly  associated  with 
U0126  sensitivity  (Fisher’s  exact  test,  copy  number  threshold 
2.14,  P  =  0.02;  Supplemental  Figure  6  and  Supplemental  Table 
7).  In  order  to  validate  this  finding  in  an  independent  data  set, 
we  made  use  of  the  NCI-60  cancer  cell-line  panel  (57)  in  which 
hypothemycin  was  used  as  a  MEK  inhibitor  (12).  This  cross-plat- 
form  validation  revealed  that  lq21.3  gain  predicted  sensitivity 
to  MEK  inhibition  in  both  data  sets  (Fisher’s  exact  test,  P  =  0.03, 
NCI-60  collection;  Supplemental  Figure  6). 

In  our  initial  cluster  analysis,  we  found  that  KRAS  mutations 
correlated  with  sensitivity  to  the  Hsp90  inhibitor  17-AAG,  a  gel- 
danamycin  derivative  (Figure  4B).  Recapitulating  this  observation, 
we  found  KRAS  mutations  to  be  predictive  of  17-AAG  sensitivity, 
even  when  applying  our  KNN-based  prediction  approach  (Fish¬ 
er’s  exact  test,  P  =  0.029;  Figure  5A  and  Supplemental  Table  7). 
Confirming  this  observation  in  an  independent  cell-line  model, 
we  found  the  distribution  of  geldanamycin  sensitivity  and  KRAS 
mutation  in  the  NCI-60  cell-line  collection  to  be  strikingly  similar 
to  that  observed  in  our  panel  (P  =  0.049;  Figure  5A). 

In  17-AAG-sensitive  cells,  Hsp90  inhibition  led  to  robust  induc¬ 
tion  of  apoptosis  (Supplemental  Figure  7A).  In  order  to  gain  mech- 
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Figure  5 

KRAS  mutations  predict  response  to  inhibition  of  Hsp90  in  vitro  and  in  vivo.  (A)  The  sensitive  and  resistant  cell  lines  were  sorted  according  to 
their  GI5o  values  and  annotated  for  the  presence  of  KRAS  mutations  (asterisks  and  black  columns).  Bar  height  represents  the  respective  GI5o 
values.  The  association  of  KRAS  mutations  and  17-AAG  sensitivity  (Gl50  <  0.07  =  sensitive;  Gl50  >  0.83  ^iM  =  resistant;  according  to  the 

lower  and  upper  25th  percentiles)  was  calculated  by  Fisher’s  exact  test  for  the  lung  cancer  data  set  (upper  panel)  and  for  the  NCI60  data  set 
(lower  panel).  (B)  Upper  panel  shows  that  whole-cell  lysates  of  the  indicated  KRAS  WT  and  KRAS  mutated  cell  lines  treated  with  different  con¬ 
centrations  of  17-AAG  were  analyzed  for  levels  of  c-RAF,  KRAS,  cyclin  D1 ,  and  AKT  by  immunoblotting.  Lower  panel  shows  that  extracts  of  the 
indicated  cells  treated  with  either  control  (C)  or  0.5  ^iM  (H322  and  Calu-6)  or  1  (H21 22)  of  1 7-AAG  were  subjected  to  coimmunoprecipitation 

with  antibodies  to  either  KRAS  (top)  or  Hsp90  (bottom);  immunoconjugates  were  analyzed  for  levels  of  Hsp90  (top)  or  KRAS  (bottom)  by  immu¬ 
noblotting.  Noncontiguous  bands  run  on  the  same  gel  are  separated  by  a  black  line  (H2122).  WB,  Western  blot.  (C)  Displayed  are  coronal  MRI 
scans  of  lox-stop-loxKRASG12D  mice  before  and  after  7  days  of  treatment  with  either  17-DMAG  or  vehicle.  The  areas  of  lung  tumors  were  manually 
segmented  and  measured  on  each  magnetic  resonance  slice,  and  total  tumor  volume  reduction  was  calculated  for  all  mice  treated  with  17-DMAG 
(n  =  4)  and  placebo  (n  =  3).  SD  of  tumor  volume  in  the  cohort  of  treated  and  untreated  mice  was  calculated  and  is  depicted  as  error  bars. 


anistic  insight  into  KRAS  dependency  on  Hsp90  chaperonage,  we 
first  confirmed  the  specificity  of  our  KRAS  antibody  (Supplemen¬ 
tal  Figure  7 C).  Using  conditions  under  which  EGFR  coprecipitated 
with  Hsp90  in  EGFR- mutant  cells  (Supplemental  Figure  7B)  (58), 
we  found  KRAS  to  be  bound  to  Hsp90  as  well  (Figure  5B).  How¬ 
ever,  while  17-AAG  treatment  depleted  mutant  EGFR  from  Hsp90 
(Supplemental  Figure  7B),  KRAS  binding  to  Hsp90  was  not  affected 
by  this  treatment  (Figure  5B).  Furthermore,  cellular  KRAS  protein 
levels  were  also  not  reduced  by  17-AAG  (Figure  5B).  These  findings 
are  surprising,  as  other  oncogenes,  such  as  EGFR  or  BRAF,  known 
to  be  dependent  on  Hsp90  chaperonage  are  depleted  from  the  com¬ 
plex  after  treatment  with  17-AAG  (58,  59).  However,  reduction  of 
viability  of  KRAS-mutant  cells  treated  with  17-AAG  is  accompanied 
by  depletion  of  c-RAF  and  AKT  (60)  (Figure  5B).  Since  both  c-RAF 


and  AKT  are  known  Hsp90  clients  (59, 61),  we  hypothesize  that  this 
observation  might  rely  on  the  activation  of  the  AKT  and  RAF/MEK/ 
ERK  signaling  pathways  by  mutant  KRAS  (62,  63). 

To  further  validate  the  power  of  KRAS  mutations  to  predict 
response  to  Hsp90  inhibition,  we  employed  a  lox-stop-loxKRASG12D 
mouse  model  that  enables  the  study  of  KRAS-driven  lung  adenocar¬ 
cinomas  in  vivo  (64).  Mice  with  established  lung  tumors  induced  by 
nasal  inhalation  of  adenoviral  Cre  (64)  were  either  treated  with  the 
water-soluble  geldanamycin  Hsp90  inhibitor  17-(dimethylamino- 
ethylamino)-17-demethoxygeldanamycin  (17-DMAG)  or  placebo. 
Whereas  no  tumor  shrinkage  was  observed  in  the  placebo-treated 
mice  after  1-week  treatment  (Figure  5C  and  Supplemental  Figure 
8),  substantial  regression  of  established  tumors  was  observed  in  3 
out  of  4  mice  receiving  17-DMAG,  with  a  tumor  volume  reduction 
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Figure  6 

Identification  of  functionally  relevant  targets  for  dasatinib  activity.  (A)  Left  panel  shows  that  cell  lines  with  copy  number  gain  involving  at  least 
1  gene  encoding  dasatinib  target  are  labeled  with  asterisks  and  black  columns.  The  probability  of  these  cells  being  dasatinib  sensitive  was 
calculated  by  Fisher’s  exact  test.  In  right  panel,  dasatinib  GI5o  values  are  shown  as  box  plots  (representing  the  25th  to  75th  percentile;  whisker 
representing  the  95th  percentile;  dots  representing  outliers)  for  cell  lines  with  (TESP+  1  gene)  and  without  (TESP-  1  gene)  copy  number  gain  of 
dasatinib  target  genes  (Wilcoxon  test).  (B)  H322M  cells  harboring  amplified  SRC  were  either  left  untreated  or  transduced  with  an  empty  vector 
control  (H322Mcont)  or  with  shRNA  targeting  SRC  (H322MSRCkd).  After  puromycin  selection,  levels  of  SRC  in  H322M  cells  transduced  with  the 
indicated  vectors  were  analyzed  by  immunoblotting  (top).  The  H322MSRCkd  lanes  were  run  on  the  same  gel  but  were  noncontiguous,  as  indicated 
by  the  white  line.  Viability  was  quantified  by  cell  counting.  Error  bars  represent  SD  between  different  experiments.  (C)  H322M  cells  were  trans¬ 
duced  with  vectors  encoding  either  active  SRC  or  active  SRC  with  a  gatekeeper  mutation  SRC  (T341 M).  Stable  cells  were  treated  with  dasatinib 
for  96  hours.  Viability  is  shown  as  percentage  of  untreated  controls.  Error  bars  indicate  SD  of  3  independent  experiments.  (D)  Dasatinib-sensitive 
(TESP+;  H322M)  or  -resistant  cells  (TESP~;  A549)  were  grown  s.c.  in  nude  mice.  After  1 4  days  of  treatment  (vehicle,  dasatinib),  tumor  volumes 
were  measured  as  diameters.  SD  of  tumor  volume  in  the  cohort  of  treated  and  untreated  mice  was  calculated  and  is  depicted  as  error  bars. 


of  up  to  80%  (Figure  5C  and  Supplemental  Figure  8).  Although 
responses  were  transient  as  those  seen  in  17-DM AG- treated  trans¬ 
genic  mice  with  EGFR-driven  lung  carcinomas  (data  not  shown), 
these  findings  validate  our  observation  that  KRAS  mutation  pre¬ 
dicts  response  to  Hsp90  inhibition  in  vivo. 

Compound  target  gene  enrichment  predicts  sensitivity.  We  have  used 
similarity  profiling  and  supervised  learning  approaches  that  led 
to  the  identification  of  predict ive  markers  based  on  significant 
lesions  found  in  our  data  set  as  defined  by  GISTIC.  However,  the 
advantage  of  statistically  defining  relevant  lesions  in  a  given  data 
set  limits  the  utility  of  lesions  occurring  at  low  frequency  and/  or 
amplitude  to  be  used  as  predictors  for  compound  sensitivity. 
We  therefore  developed  an  additional  approach,  denoted  Target- 
Enriched  Sensitivity  Prediction  (TESP),  which  enables  inclusion  of  sta¬ 
tistically  underrepresented  yet  biologically  relevant  lesions. 

Amplification  of  drug-target  genes  has  been  demonstrated  to 
predict  vulnerability  to  target-specific  compounds  in  ERBB2- 


amplified  breast  cancer  and  EGFR- amplified  lung  cancer  (1, 
46).  We  therefore  speculated  that  chromosomal  copy  number 
alterations  of  biochemically  defined  drug  targets  could  be  used 
for  prediction  of  sensitivity  to  other  tyrosine  kinase  inhibitors 
as  well.  To  this  end,  we  used  tyrosine  kinase  inhibitor  targets 
defined  by  the  quantitative  dissociation  constant  as  determined 
in  quantitative  kinase  assays  (65).  As  a  proof  of  principle,  we 
determined  whether  copy  number  gain  in  EGFR  is  associated 
with  sensitivity  to  erlotinib  (40).  In  our  systematic  approach, 
cell  lines  inhibited  by  erlotinib  at  clinically  achievable  dosages 
(up  to  1  pM)  were  highly  enriched  for  amplification  of  EGFR 
(. P  =  0.00023;  Supplemental  Figure  9A).  We  next  tested  our  pre¬ 
diction  model  for  lapatinib,  a  specific  inhibitor  of  ERBB2  and 
EGFR,  clinically  approved  for  ERBB2-positive  breast  cancer 
(66).  Again,  we  observed  cell  lines  inhibited  by  lapatinib  (n  =  82) 
below  clinically  achievable  dosage  of  1  pM  to  be  significantly 
enriched  in  the  subgroup  of  cell  lines  with  amplification  of 
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ERBB2  or  EGFR  (Fisher’s  exact  test,  P  =  0.009;  data  not  shown). 
Thus,  TESP  enables  discovery  of  clinically  relevant  genotype- 
phenotype  relationships. 

Encouraged  by  these  findings,  we  set  out  to  test  our  approach  for 
compounds  inhibiting  a  wide  range  of  kinases,  such  as  dasatinib 
(65).  We  determined  the  distribution  of  GI50  values  of  cell  lines  with 
chromosomal  copy  number  gain  (copy  number  >  3)  affecting  at 
least  1  or  2  of  either  one  of  the  genes  encoding  the  most  biochemi¬ 
cally  sensitive  dasatinib  targets  and  compared  these  to  the  distri¬ 
bution  of  GI50  values  of  cells  without  copy  number  gain  at  these 
genomic  positions  (Figure  6A,  Supplemental  Table  8,  and  Supple¬ 
mental  Figure  9B).  As  hypothesized,  these  groups  were  significant¬ 
ly  distinct  in  the  distribution  of  GI50  values  (P  =  1.8  x  103  when 
1  gene  was  affected  and  P  =  4.6  x  103  when  2  of  the  target  genes 
were  affected  by  copy  number  gain;  Figure  6A  and  Supplemental 
Figure  9B).  In  particular,  this  predictor  comprised  copy  number 
gain  at  the  loci  of  gene  family  members  of  ephrin  receptor  kinases 
(. EPHA3 ,  EPHA5 ,  and  EPHA8 ),  SRC  kinases  (SRC,  FRK ,  YES1,  LCK , 
and  BLK ),  and  ABL2 ,  suggesting  that  NSCLC  cells  harboring  such 
lesions  might  be  exquisitely  sensitive  to  therapeutic  inhibition 
of  the  encoded  proteins.  The  probability  that  cell  lines  with  copy 
number  gain  at  either  1  or  2  of  these  genes  will  be  sensitive  to  dasat¬ 
inib  treatment  (GI50  <  100  nM)  increases  up  to  5.6-fold  (gain  of  1 
gene)  and  15.8-fold  (gain  of  2  genes),  respectively,  when  compared 
with  cells  without  copy  number  gain  at  these  loci  (Figure  6A  and 
Supplemental  Figure  9B).  In  contrast,  copy  number  gain  involving 
loci  encoding  biochemically  less  sensitive  dasatinib  targets  failed  to 
show  enrichment  of  sensitive  cell  lines  (data  not  shown). 

In  cells  with  copy  number  gain  of  biochemically  defined  dasatinib 
target  genes,  dasatinib  treatment  led  to  robust  induction  of  apopto¬ 
sis  (data  not  shown).  Importantly,  copy  number  gain  of  at  least  one 
of  either  of  these  genes  is  present  in  12.9%  (copy  number  >  3)  of  sev¬ 
eral  hundred  primary  lung  adenocarcinomas  (15)  (data  not  shown), 
thus  emphasizing  the  potential  clinical  relevance  of  our  predictor. 

In  the  dasatinib-sensitive  cell-line  H322M  harboring  amplified 
SRC ,  dasatinib  treatment  led  to  dephosphorylation  of  SRC  at  low 
nanomolar  doses,  paralleling  growth  inhibition  at  similar  concen¬ 
trations  (Supplemental  Figure  9C).  In  order  to  determine  whether 
the  genes  in  our  dasatinib  predictor  are  causatively  linked  with  the 
activity  of  dasatinib,  we  silenced  SRC  by  lentiviral  shRNA  in  H322M 
cells  (Figure  6B).  When  compared  with  parental  cells  or  cells  express¬ 
ing  the  control  vector,  H322M-SRC-knockdown  (. H322MSRCkd )  cells 
showed  a  massive  reduction  in  cellular  proliferation  (Figure  6B) 
and  increase  in  cell  death  (data  not  shown).  In  order  to  further  vali¬ 
date  activated  SRC  as  the  relevant  dasatinib  target  in  H322M  cells, 
we  expressed  an  activated  allele  of  SRC  together  with  a  sterically 
demanding  mutation  at  the  gatekeeper  position  of  the  ATP-binding 
pocket  (T341M)  (67);  this  mutation  and  the  analogous  mutations 
in  Bcr-Abl  and  EGFR  (see  above)  induce  on-target  drug  resistance 
(67)  by  displacing  the  compound  from  the  ATP-binding  pocket.  As 
hypothesized,  expression  of  the  T3141M  gatekeeper  mutation  but 
not  of  SRC  alone  rescued  dasatinib-induced  cell  death  in  H322M 
cells  (Figure  6C).  These  results  formally  validate  SRC  as  the  relevant 
dasatinib  target  in  SRC- amplified  NSCLC  cells. 

We  also  validated  EPHA3  as  a  relevant  target  in  H28  cells  with 
gain  of  EPHA3  by  showing  decreased  viability  of  these  cells  upon 
stable  knockdown  of  EPHA3  (Supplemental  Figure  10). 

We  next  transplanted  cells  with  or  without  copy  number  gain 
of  SRC  into  nude  mice.  Mice  were  treated  with  either  dasatinib 
or  placebo  on  a  daily  application  schedule.  Again  confirming  our 


in-vitro  observations,  robust  tumor  shrinkage  was  observed  in 
mice  transplanted  with  cells  harboring  copy  number  gain  of  SRC 
(H322M)  (Figure  6D)  receiving  dasatinib.  In  contrast,  no  tumor 
shrinkage  was  observed  in  mice  transplanted  with  cells  predicted 
to  be  resistant  against  dasatinib  (A549)  and  in  all  mice  treated  with 
placebo  (Figure  6D).  We  consistently  failed  to  grow  EPHA3-ampli- 
fied  H28  cells  in  nude  mice;  HCC515  cells  were  therefore  chosen 
as  another  model  of  NSCLC  with  gain  of  EPHA3.  Dasatinib  treat¬ 
ment  of  established  HCC515  tumors  also  induced  significant 
tumor  shrinkage  (data  not  shown). 

Together,  these  results  show  that  in  NSCLC,  copy  number  gain 
of  ephrin  receptor  or  SRC  family  member  genes  and  ABL2  may 
render  tumor  cells  dependent  on  these  kinases,  thus  exposing  a 
vulnerability  to  therapeutic  inhibition  with  dasatinib. 

Discussion 

Here,  we  show  that  diverse  analytical  approaches  of  multiple 
orthogonal  genomic  and  chemical  perturbation  data  sets  perti¬ 
nent  to  a  large  collection  of  cancer  cell  lines  afford  insights  into 
how  somatic  genetic  lesions  impact  cell  biology  and  therapeutic 
response  in  cancer.  Such  data  sets  provide  a  rich  source  for  dif¬ 
ferent  computational  approaches  that  each  yield  complementary, 
accurate,  and  valid  predictors  of  inhibitor  sensitivity.  The  basis  for 
such  predictions  is  a  panel  of  genomically  annotated  NSCLC  cell 
lines  that  is  representative  of  the  genetic  diversity,  the  transcrip¬ 
tional  profile,  and  the  phenotypic  properties  of  primary  NSCLC 
tumors.  The  overall  functional  biological  validity  of  our  approach 
is  supported  by  the  observation  that  EGFR  mutations  are  the 
strongest  predictor  of  sensitivity  to  the  EGFR  inhibitor  erlotinib. 
Others  have  similarly  observed  high  activity  of  EGFR  inhibitors  in 
EGTR-mutant  NSCLC  cell  lines  (6, 13,  68),  supporting  the  validity 
of  our  unbiased  computational  approach  employing  systematic 
global  measurements  of  genetic  lesions. 

Applying  systematic  similarity  profiling  using  computationally 
defined  significant  genetic  lesions,  we  also  identified  predictors 
for  compounds  currently  in  clinical  use  or  trials.  Specifically,  in 
an  unbiased  manner,  we  confirmed  EGFR  mutations  not  only  to 
predict  sensitivity  to  EGFR  inhibitors  (erlotinib,  PD  168393,  van- 
detanib)  (6-8, 47,  52)  but  also  to  the  SRC/ABL  inhibitor  dasatinib 
(54, 56).  We  formally  demonstrated  that  EGFR  is  the  relevant  target 
of  dasatinib  in  EGFR- mutant  cells  by  showing  the  lack  of  activity 
of  this  compound  in  Ba/F3  cells  expressing  the  T790M  resistance 
allele  of  EGFR.  Thus,  exploring  multiple  orthogonal  genomic 
and  chemical  data  sets  enabled  the  formal  definition  of  a  relevant 
tumor-cell  target  of  an  FDA- approved  drug. 

In  addition,  we  performed  supervised  identification  of  predictors 
for  drug  sensitivity.  A  noteworthy  finding  is  the  role  of  KRAS  muta¬ 
tion  as  a  predictor  of  sensitivity  to  17-AAG.  Independent  valida¬ 
tion  of  the  predictor  for  an  Hsp90  inhibitor  in  a  transgenic  murine 
lung  cancer  model  strengthens  the  robustness  of  our  approach. 
Given  the  high  prevalence  of  cancer  patients  with  mutated  KRAS 
and  their  unfavorable  prognosis,  this  finding  might  be  of  clini¬ 
cal  importance,  as  Hsp90  inhibitors  (e.g.,  17-AAG,  IPI-504,  NVP- 
AUY922)  are  currently  under  clinical  evaluation. 

Finally,  our  compound  target-enrichment  approach  for  predic¬ 
tion  of  sensitivity  led  to  the  observation  of  exquisite  vulnerabil¬ 
ity  of  cells  with  copy  number  gain  of  ephrin  receptor  and  SRC 
family  genes  as  well  as  ABL2  to  dasatinib  treatment.  As  a  proof 
of  principle  we  validated  our  prediction  model  in  great  depth  for 
the  relevance  of  SRC  amplification  for  dasatinib  activity  in  vitro 


1736 


The  Journal  of  Clinical  Investigation  http://www.jci.org  Volume  119  Number  6  June  2009 


technical  advance 


and  in  vivo.  Thus,  copy  number  gain  affecting  one  of  these  genes 
may  render  tumor  cells  dependent  on  the  encoded  kinases,  thereby 
defining  potential  biomarkers  for  successful  treatment  of  NSCLC 
patients  with  dasatinib,  an  FDA- approved  drug. 

In  summary,  we  have  established  a  genomically,  phenotypically, 
and  functionally  validated  tool  for  studying  drug  activity  mecha¬ 
nisms  in  the  laboratory.  Our  results  strengthen  the  notion  that 
multiple  orthogonal  data  sets  pertinent  to  large  cancer  cell-line 
collections  may  offer  an  as-yet-unmatched  potential  for  explor¬ 
ing  the  cell-biological  impact  of  novel  compounds  in  genomi¬ 
cally  defined  cancer  types.  Such  cell-line  collections  may  advance 
molecularly  targeted  treatment  of  cancer  by  providing  a  tool  for 
preclinical  molecular  drug  target  validation  on  the  basis  of  the 
genetic  lesion  signature  characteristic  of  individual  tumors. 

Methods 

Cells.  The  cell-line  collection  generated  by  A.F.  Gazdar,  J.  Minna,  and  col¬ 
leagues  (69,  70)  formed  the  basis  of  this  collection.  Further  cell  lines  were 
obtained  from  ATCC,  DSMZ  (German  Collection  of  Microorganisms  and 
Cell  Cultures,  Germany),  and  our  own  or  other  cell  culture  collections. 
Details  on  all  cell  lines  are  listed  in  Supplemental  Table  1,  including  pro¬ 
viders  and  culture  conditions.  Cells  were  routinely  controlled  for  infection 
with  mycoplasma  by  MycoAlert  (Cambrex)  and  were  treated  with  antibiot¬ 
ics  according  to  a  previously  published  protocol  (71)  in  case  of  infection. 

SNP  arrays.  Genomic  DNA  was  extracted  from  cell  lines  using  the  Pure- 
gene  kit  (QIAGEN)  and  hybridized  to  high-density  oligonucleotide  arrays 
(Affymetrix)  interrogating  238,000  SNP  loci  on  all  chromosomes  except  Y, 
with  a  median  intermarker  distance  of  5.2  kb  (mean  12.2  kb).  Array  experi¬ 
ments  were  performed  according  to  the  manufacturer’s  instructions.  SNPs 
were  genotyped  by  the  Affymetrix  Genotyping  Tools  software,  version  2.0. 
SNP  array  data  of  371  primary  samples  were  obtained  from  the  Tumor 
Sequencing  Project  (processed  data  file  viewable  in  GenePattern’s  SNP 
viewer:  dataset.snp;  http://www.broad.mit.edu/cancer/pub/tsp/)  (15).  We 
applied  what  we  believe  is  a  novel  and  general  method  for  GISTIC  (14)  to 
analyze  the  data  sets.  In  brief,  each  genomic  marker  was  scored  according 
to  an  integrated  measure  of  the  prevalence  and  amplitude  of  copy  number 
changes  (and  only  prevalence  in  the  case  of  LOH),  and  the  statistical  signifi¬ 
cance  of  each  score  was  assessed  by  comparison  with  the  results  expected 
from  the  background  aberration  rate  alone.  The  GISTIC  algorithm  was  run 
using  2  different  pairs  of  copy  number  thresholds:  copy  number  4  (ampli¬ 
fications);  1  (deletions);  and  copy  number  2.14  (amplifications);  1.87  (dele¬ 
tions)  to  reflect  focal  and  broad  events,  respectively.  For  the  sake  of  simplic¬ 
ity,  we  refer  to  these  settings  using  only  the  amplification  threshold. 

Detection  of  homozygous  deletions.  For  identification  of  homozygous  dele¬ 
tions,  SNP  data  were  filtered  for  5  coherent  SNPs  exhibiting  copy  numbers 
of  less  than  0.5.  The  analysis  was  focused  on  focal  losses,  excluding  entire 
chromosomal  arms.  Information  about  genes  located  in  a  region  of  homo¬ 
zygous  deletion  was  based  on  hgl7  build  of  the  human  genome  sequence 
from  the  University  of  California  Santa  Cruz  (http://genome.ucsc.edu). 

Analysis  of  cooccurring  lesions.  The  analysis  was  performed  computing 
ratios  of  observed  versus  expected  cooccurrence  frequency  of  individual 
lesions.  Hierarchical  clustering  of  mutation  data  combined  to  quantita¬ 
tive  copy  number  changes  that  were  dichotomized  was  performed  using 
the  reciprocal  cooccurrence  ratio  as  distance  measure  with  average  linkage 
method.  As  the  adequate  threshold  for  occurrence  of  copy  number  lesions 
depends  on  the  overall  level  of  copy  number  alteration  for  that  specific 
lesion,  the  sum  of  these  ratios  for  3  distinct  thresholds  was  used. 

Mutation  detection.  Mutation  status  of  known  oncogene  mutations  in 
the  genes  EGFR,  BRAF,  ERBB2,  PIK3CA,  NRAS,  KRAS,  ABL1,  AKT2,  CDK4, 
FGFR1,  FGFR3,  FLT3,  HRASJAK2,  KIT,  PDGFRA,  and  RET  was  determined 


by  mass-spectrometric  genotyping.  Mutation  status  of  these  genes  for  all 
cell  lines  was  published  previously  (22).  In  addition,  the  genes  EGFR,  BRAF, 
ERBB2,  PIK3CA,  KRAS,  TP53,  STK11,PTEN,  and  CDKN2A  were  bi-direction¬ 
ally  sequenced  following  PCR  amplification  of  all  coding  exons. 

Expression  arrays.  Expression  data  for  54  of  the  cell  lines  were  obtained 
using  Affymetrix  U133A  arrays.  RNA  extraction,  hybridization,  and  scan¬ 
ning  of  arrays  were  performed  using  standard  procedures  (35).  CEL  files 
from  U133A  arrays  were  preprocessed  using  the  dChip  software  (http:// 
biosunl.harvard.edu/complab/dchip/;  built  date  May  5,  2008).  We  com¬ 
pared  the  cell  lines  with  cell  lines  and  primary  tumors  from  lung  can¬ 
cer  (28),  renal  cell  carcinomas  (29,  72),  and  lymphoma  (30,  73)  data  sets 
obtained  from  GEO  (http://www.ncbi.nlm.nih.gov/geo/)  by  hierarchical 
clustering.  Data  were  processed  by  standard  procedures;  normalization 
was  performed  in  dChip.  For  comparison  of  NSCLC  cell  lines  (U133A)  and 
primary  tumors,  we  used  data  on  adenocarcinomas  from  Bhattacharjee 
and  colleagues  generated  on  U95Av2  arrays  (35).  We  selected  genes  that  we 
found  differentially  expressed  between  cell  lines  with  mutant  EGFR  and 
WT  EGFR  (fold  change  between  groups  >2,  90%  Cl;  absolute  difference  > 
100,  P  <  0.01)  and  between  erlotinib-sensitive  and  erlotinib-resistant  cell 
lines  (erlotinib-sensitive  [GI50  <0.1  [aM]  vs.  erlotinib-resistant  [GI50  >  2  |aM], 
fold  change  >  2, 90%  Cl;  absolute  difference  >  100,  P  <  0.005).  For  principal 
component  analysis,  the  R  language  for  statistical  computing  was  used. 
Variable  transcripts  were  identified  using  the  following  filtering  criteria: 
coefficient  of  variation  1.9  through  10, 40%  present  call  rate.  The  first  prin¬ 
cipal  component  described  14.5%  of  the  overall  variance,  the  second  9.6%, 
and  the  third  8.2%.  Using  a  cutoff  of  1400  in  the  eigenvalue,  samples  were 
grouped  according  to  the  first  principal  component. 

Cell-based  screening.  All  compounds  were  purchased  from  commercial  sup¬ 
pliers  or  synthesized  in  house,  dissolved  in  DMSO,  and  stored  at  -80  °C. 
Cells  were  plated  into  sterile  microtiter  plates  using  a  Multidrop  instrument 
(Thermo  Scientific)  and  cultured  overnight.  Compounds  were  then  added 
in  serial  dilutions.  Cellular  viability  was  determined  after  96  hours  by  mea¬ 
suring  cellular  ATP  content  using  the  CellTiter-Glo  Assay  (Promega).  Plates 
were  measured  on  a  Mithras  LB  940  Plate  Reader  (Berthold  Technologies). 
GI50  values  were  determined  from  the  preimage  under  the  growth  inhibition 
curve,  where  the  latter  was  smoothed  according  to  the  logistic  function  with 
the  parameters  appropriately  chosen.  For  these  analyses,  we  have  established 
a  semiautomated  pipeline  as  what  we  believe  to  be  a  novel  R  package  (43). 

Lesion-based  prediction  of  compound  sensitivity.  For  lesion-based  prediction  of 
sensitivity,  3  different  approaches  were  applied.  First,  the  most  sensitive  and 
most  resistant  samples  were  chosen  according  to  their  sensitivity  profile. 
Where  the  sensitivity  profile  of  the  corresponding  compound  did  not  allow 
a  clear  distinction  between  resistant  and  sensitive  cell  lines,  groups  were 
defined  by  the  25th  and  75th  percentiles.  We  used  Fisher’s  exact  test  to  eval¬ 
uate  the  association  between  the  activity  of  the  compound  and  the  presence 
of  significant  lesions  as  defined  by  GISTIC.  For  this  purpose,  the  cell-line 
panel  was  divided  according  to  the  presence  of  each  lesion.  The  logarithmi¬ 
cally  transformed  GI50  values  pertinent  to  each  group  were  now  compared 
by  a  2-sample  Welch’s  t  test.  In  order  to  avoid  an  artificially  low  variance,  the 
Welch’s  t  tests  were  based  on  a  fixed  variance  determined  as  the  mean  of  the 
variances  that  were  clearly  distinct  from  zero  (>0.1).  Details  of  this  proce¬ 
dure  are  presented  in  the  publication  by  Solit  and  colleagues  (12). 

In  a  next  step,  multilesion  predictors  of  sensitivity  were  calculated 
using  feature  selection,  with  subsequent  validation  by  a  KNN  algorithm 
with  a  leave-one-out  strategy  (45),  in  which  the  same  choice  of  samples 
was  used  as  above  for  Fisher’s  exact  test:  For  all  but  1  sample,  genetic 
lesions  strongly  discriminating  between  sensitive  and  resistant  cell  lines 
were  selected  and  the  prediction  was  validated  by  the  remaining  left-out 
sample.  Copy  number  data  were  dichotomized  to  ensure  a  better  compa¬ 
rability  with  the  mutation  data.  Five  different  thresholds  were  used  to 
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dichotomize  the  copy  numbers:  2.14,  2.46, 2.83,  3.25,  and  4  for  amplified 
loci;  and  1.87,  1.62,  1.41,  1.23,  and  1  for  deletions.  The  collection  of  fea¬ 
tures  and  the  threshold  for  the  dichotomization  were  selected  for  which 
the  leave-one-out  validation  showed  best  performance  and  was  taken  as 
the  best  combined  predictor  to  the  respective  compound.  As  a  measure 
to  select  the  setting  with  the  largest  predictive  strength,  the  Youden  index 
(sensitivity  +  specificity  -  1)  was  used. 

For  example,  the  best  erlotinib  single  gene  predictor  was  obtained  when 
the  lesion  data  were  dichotomized  using  the  thresholds  3.25  and  1.23, 
respectively.  Cell  lines  with  a  GI50  of  less  than  0.07  pM  were  considered 
sensitive.  For  the  predictor,  the  same  cutoff  values  were  used.  Best  per¬ 
formance  in  the  leave-one-out  cross  validation  was  obtained  using  15  fea¬ 
tures,  k  =  3  neighbors,  and  the  cosine-based  metric.  Due  to  the  problem  of 
multiple  hypothesis  testing,  the  significance  of  the  above  Welch's  t  tests  as 
well  as  Fisher's  exact  tests  should  be  understood  in  an  explorative  rather 
than  confirmative  sense. 

The  NCI-60  cancer  cell-line  panel  was  used  for  validation  of  our  find¬ 
ings  (http://dtp.nci.nih.gov/mtargets/mt_index.html).  Since  the  MEK 
inhibitor  U0126  and  the  Hsp90  inhibitor  17-AAG  were  not  covered  by 
the  collection  of  pharmacological  data,  we  analyzed  the  association  of  the 
respective  lesions  to  hypothemycin  (MEK  inhibitor)  and  to  geldanamycin 
(17-AAG  is  a  geldanamycin  derivate)  instead.  Significance  of  association 
was  analyzed  by  Fisher's  exact  test.  Due  to  strongly  discordant  GI50  val¬ 
ues,  the  cell  lines  HOP62  and  A549  were  excluded  from  the  analysis  with 
respect  to  the  Hsp90  inhibitors.  The  thresholds  for  lq21.3  amplification 
were  set  according  to  the  overall  distribution  of  copy  number  changes  in 
the  respective  data  sets  (2.7  corresponding  to  33%  of  the  NSCLC  cell  lines; 
2.4  corresponding  to  33%  of  the  NCI-60  collection). 

All  Fisher's  exact  tests,  Welch's  t  tests  (all  2-tailed),  and  Wilcoxon  tests 
were  performed  using  R  version  2.7.1  (http://www.wpic.pitt.edu/WPIC- 
CompGen/hclust/hclust.htm).  A  level  of  significance  of  5%  was  chosen. 
For  cluster  analysis,  the  R  routine  “hclust”  was  used. 

Structural  modeling  of  compound  binding.  The  crystal  structures  of  dasat- 
inib  bound  to  ABL  kinase  (pdb  code  2IVU;  ref.  74)  and  vandetanib 
bound  to  the  RET  kinase  (pdb  code  2IVU;  ref.  75)  were  aligned  to  the 
kinase  domain  of  EGFR  bound  to  erlotinib  (pdb  code  1M17;  ref.  76) 
using  PyMOL  software,  l.lbeta  (DeLano  Scientific  LLC).  Based  on  the 
structural  alignment  of  ABL  with  EGFR,  the  binding  mode  for  dasatinib 
in  EGFR  is  identical  to  that  of  the  dasatinib-Abl  complex.  Figures  of  the 
structures  were  prepared  using  PyMOL. 

Western  blot  analyses.  Whole-cell  lysates  were  prepared  in  NP40  lysis  buffer 
(50  mmol/1  Tris-HCl,  pH  7.4,  150  mmol/1  NaCl,  1%  NP40)  supplemented 
with  protease  and  phosphatase  inhibitor  I  and  II  cocktails  (Merck)  and 
clarified  by  centrifugation.  Proteins  were  subjected  to  SDS-PAGE  on  12% 
gels,  except  where  indicated.  Western  blotting  was  done  as  described  previ¬ 
ously  (77).  The  EGFR  (no.  2232),  the  AKT  (no.  9272),  and  the  phosphor- 
SRC  (Tyr416)  (no.  2101)  antibodies  were  both  purchased  from  Cell  Signal¬ 
ing  Technology.  The  SRC  (GD 1 1)  antibody  was  purchased  from  Millipore. 
The  Hsp90  antibody  (16F1)  was  purchased  from  Stressgen  (Assay  Designs). 
The  phospho-EGFR  (Tyr1068)  antibody  was  purchased  from  BioSource 
(Invitrogen).  The  cyclin  D1  (DCS-6),  the  c-RAF  (C-20),  and  the  actin  (C-l  1) 
antibody  were  purchased  from  Santa  Cruz  Biotechnology  Inc.  The  KRAS 
(234-4.2)  antibody  was  purchased  from  Calbiochem. 

Immunoprecipitation.  For  the  detection  of  complexes  of  Hsp90  with  KRAS 
or  EGFR  and  vice  versa,  whole-cell  lysate  (0.5-1  mg)  in  NP40  lysis  buf¬ 
fer  was  incubated  with  Agarose  A/G  Plus  preconjugated  with  the  Hsp90 
or  KRAS  antibody  (see  Western  blot  analyses).  Immunoprecipitates  were 
washed  in  NP40  lysis  buffer,  boiled  in  sample  buffer,  and  subjected  to  SDS- 
PAGE  followed  by  Western  blotting  using  an  anti  KRAS,  Hsp90,  or  EGFR 
antibody  to  detect  complex  formation. 


Apoptosis  assays.  Cells  were  plated  in  6-well  plates  after  24  hours  of 
incubation,  treated  with  17-AAG  for  72  hours,  and  finally  harvested 
after  trypsinization.  Then  cells  were  washed  with  PBS,  resuspended  in 
annexin  V  binding  buffer,  and  finally  stained  with  annexin  V-FITC  and 
propidium  iodide.  FACS  analysis  was  performed  on  a  FACSCanto  flow 
cytometer  (BD  Biosciences),  and  results  were  calculated  using  FACSDiva 
Software,  version  5.0. 

Transfection  and  infection.  Replication-incompetent  retroviruses  were  pro¬ 
duced  from  pBabe-based  vectors  by  transfection  into  the  Phoenix  293-TL 
packaging  cell  line  (Orbigen)  using  the  calcium  precipitation  method. 
Replication-incompetent  lentiviruses  were  produced  from  pLKO.l-puro 
based  vectors  containing  the  shRNA  insert  (http://www.broad.mit.edu/ 
node/563)  by  cotransfection  of  293-TL  cells  with  pMD.2  and  pCMVd.8.9 
helper  plasmids  using  reagent  Trans-LT  (Mirus).  Cells  were  infected  with 
viral  supernatants  in  the  presence  of  polybrene.  After  24  hours,  medium 
was  changed  and  cell  lines  were  selected  with  1-2  pg/ml  puromycin,  from 
which  stable  transduced  clonal  cell  lines  were  derived. 

Site-directed  mutagenesis.  All  mutations  (Y530F;  T341M)  were  introduced 
into  the  c-SRC  ORF  with  the  QuikChange  XL  II  Mutagenesis  Kit  (Strata- 
gene)  following  the  instructions  of  the  manufacturer.  Oligonucleotides 
covering  the  mutations  were  designed  with  the  software  provided  by  Strat- 
agene,  and  each  mutant  was  confirmed  by  sequencing. 

17-DMAG  treatment  in  LSL-KRAS  mice.  The  lox-stop-lox-KRAS  (LSL- 
KRAS)  mouse  lung  cancer  model  has  been  described  elsewhere  (64).  Seven 
mice  were  imaged  by  MRI  at  12  to  20  weeks  after  adeno-CRE  treatments 
to  document  initial  tumor  volume.  The  mice  were  then  divided  into  17- 
DMAG  (LC  Laboratories)  and  placebo  treatment  groups,  with  4  and  3 
mice  in  each  group,  respectively.  17-DMAG  was  formulated  in  saline  and 
given  through  tail-vein  injection  at  20  mg/kg/d  dosing  schedule.  Mice  were 
imaged  by  MRI  after  1  week  of  drug  treatment  and  sacrificed  for  further 
histological  analysis  thereafter.  The  protocol  for  animal  work  was  approved 
by  the  Dana-Farber  Cancer  Institute  Institutional  Animal  Care  and  Use 
Committee,  and  the  mice  were  housed  in  a  pathogen-free  environment  at 
the  Harvard  School  of  Public  Health. 

MRI  scanning  and  tumor  volume  measurement.  Mice  were  anesthetized  with 
1%  isoflurane;  respiratory  and  cardiac  rates  were  monitored  with  BioTrig 
Software,  version  BT1  (Bruker  BioSpin).  Animals  were  imaged  in  the  coro¬ 
nal  planes  with  a  rapid  acquisition  with  relaxation  enhancement  (RARE) 
sequence  (Tr  =  2000  ms;  TE  effect  =  25  ms,  where  Tr  =  pulse  repetition  time 
and  TE  =  minimum  echo  time),  using  17x1  mm  slices  to  cover  the  entire 
lung.  Matrix  size  of  128  x  128  and  field  of  view  (FOV)  of  2.5  x  2.5  cm2  were 
used  for  all  imaging.  The  areas  of  lung  tumors  were  manually  segmented 
and  measured  using  ImageJ  software  (version  1.33;  http://rsbweb.nih.gov/ 
ij /)  on  each  magnetic  resonance  slice.  Total  tumor  volume  was  calculated 
by  adding  tumor  areas  from  all  17  slices  (78).  Note  that  MRI  cannot  clearly 
distinguish  tumor  lesions  and  postobstruction  pneumonia  that  is  induced 
by  bronchial  tumors  of  this  particular  tumor  model. 

Xenograft  models.  All  animal  procedures  were  in  accordance  with  the  Ger¬ 
man  Laws  for  Animal  Protection  and  were  approved  by  the  local  animal 
protection  committee  and  the  local  authorities  (Bezirksregierung  Koln). 
Tumors  were  generated  by  s.c.  injections  of  5  x  106  tumor  cells  into  nu/nu 
athymic  male  mice.  When  tumors  had  reached  a  size  of  about  50  mm3, 
animals  were  randomized  into  2  groups,  control  (vehicle)  and  dasatinib- 
treated  mice.  All  controls  were  dosed  with  the  same  volume  of  vehicle. 
Mice  were  treated  daily  by  oral  gavage  of  20  mg/kg  dasatinib.  The  vehicle 
used  was  propylene  glycol/water  (1:1).  Tumor  size  was  monitored  every  2 
days  by  measuring  perpendicular  diameters.  Tumor  volumes  were  calcu¬ 
lated  from  the  determination  of  the  largest  diameter  and  its  perpendicular 
diameter  according  to  the  equation  [tumor  volume  =  a  x  (b2/2),  where 
a  =  tumor  width  and  b  =  tumor  length]. 
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Abstract  Purpose:  By  hypomethylating  genes,  decitabine  may  up-regulate  factors  required  for 
chemotherapeutic  cytotoxicity.  Platinum-resistant  cells  may  have  reduced  expression 
of  the  copper/platinum  transporter  CTR1. 

Experimental  Design:  Thirty-one  patients  with  refractory  malignancies  received  decita¬ 
bine  2.5  to  10  mg/m2  on  days  1  to  5,  and  8  to  12  or  15  to  20  mg/m2  on  days  1  to  5.  Tumor 
was  assessed  for  DNA  methylation  (by  LINE  assays),  apoptosis,  necrosis,  mitoses, 
Ki67,  DNA  methyltransferase  (DNMT1),  CTR1,  and  p16. 

Results:  Febrile  neutropenia  was  dose  limiting.  One  thymoma  patient  responded.  Dec¬ 
itabine  decreased  tumor  DNA  methylation  (from  median  51.2%  predecitabine  to  43.7% 
postdecitabine;  P=  0.01,  with  effects  at  all  doses)  and  in  peripheral  blood  mononuclear 
cells  (from  65.3-56.0%).  There  was  no  correlation  between  tumor  and  peripheral  blood 
mononuclear  cells.  Patients  starting  decitabine  <3  versus  >3  months  after  last  prior  cy¬ 
totoxic  or  targeted  therapy  had  lower  predecitabine  tumor  CTR1  scores  (P=  0.02),  high¬ 
er  p16  (P=  0.04),  and  trends  (P=  0.07)  toward  higher  tumor  methylation  and  apoptosis. 
Decitabine  decreased  tumor  DNMT1  for  scores  initially  >0  (P=  0.04).  Decitabine  in¬ 
creased  tumor  apoptosis  (P<  0.05),  mitoses  (if  initially  low,  P=  0.02),  and  CTR1  (if  ini¬ 
tially  low,  P=  0.025,  or  if  <3  months  from  last  prior  therapy,  P=  0.04).  Tumor  CTR1 
scores  correlated  inversely  with  methylation  (r  =  -0.41,  P=  0.005),  but  CTR1  promoter 
was  not  hypermethylated.  Only  three  patients  had  tumor  p16  promoter  hypermethyla- 
tion.  PI 6  scores  did  not  increase.  Higher  blood  pressure  correlated  with  lower  tumor 
necrosis  (P=  0.03)  and  a  trend  toward  greater  DNA  demethylation  (P=  0.10). 
Conclusions:  Exposure  to  various  cytotoxic  and  targeted  agents  might  generate  broad 
pleiotropic  resistance  by  reducing  CTR1  and  other  transporters.  Decitabine  decreases 
DNA  methylation  and  augments  CTR1  expression  through  methylation-independent 
mechanisms. 


Hypermethylation  by  DNA  methyltransferase  (DNMT)  helps 
regulate  gene  expression,  and  tumor  suppressor  gene  hyper¬ 
methylation  promotes  tumorigenesis  (1-4).  Altered  gene  meth¬ 
ylation  may  also  cause  chemotherapy  resistance  (5).  Many 
factors  underlie  chemotherapy  resistance  (6),  but  dose-response 
curve  flattening  at  higher  doses  (7)  suggests  that  deficiency  of 
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factors  required  for  cytotoxicity  may  be  particularly  important. 
For  example,  platinum-resistant  cells  may  have  hypermethyla¬ 
tion  of  the  MLH1  mismatch  repair  gene  that  is  important  in 
triggering  platinum  cytotoxicity  (8)  or  may  have  a  pleiotropic 
reduction  in  transporters  (9,  10)  that  is  potentially  reversible 
by  the  DNMT  inhibitor  decitabine  (9).  The  copper  transporter 
CTR1  contributes  to  cellular  platinum  uptake  (11).  Platinum 
exposure  rapidly  decreases  CTR1  expression,  thereby  reducing 
further  platinum  influx  (12). 

DNMT  inhibitors  tested  clinically  include  decitabine  (5-aza- 
2'-deoxycytidine;  refs.  2,  4,  13-16),  5-azacytidine  (4),  and  MG- 
98  (17).  Decitabine  inhibits  DNMT,  depletes  DNMT1  through 
proteosomal  degradation  (18),  induces  global  DNA  hypo- 
methylation,  and  increases  expression  of  specific  genes  through 
mechanisms  both  dependent  on  (2,  16)  as  well  as  independent 
of  (2,  4)  promoter  hypomethylation. 

Of  administration  schedules  tested  in  leukemias,  1-hour  low- 
dose  decitabine  infusions  days  1  to  5  +  days  8  to  12  every  4 
weeks  may  be  particularly  effective  therapeutically  (13-15, 
19)  and  in  demethylating  DNA  (14,  19).  Low  decitabine  doses 
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Translational  Relevance 

(a)  Decitabine  reduces  DNA  methylation  in  solid 
tumors,  and  should  be  assessed  for  its  ability  to  in¬ 
crease  expression  of  factors  required  for  efficacy  of 
other  agents,  (b)  Peripheral  blood  mononuclear  cells 
should  not  be  used  as  surrogates  for  decitabine  ef¬ 
fect  in  tumor,  (c)  Exposure  within  the  previous  3 
months  to  a  wide  range  of  chemotherapy  and  tar¬ 
geted  agents  is  associated  with  decreased  copper/ 
platinum  uptake  transporter  CTR1  and  with  a  trend 
to  increased  DNA  methylation.  Hence,  many  agents 
might  reduce  subsequent  platinum  uptake.  We  will 
explore  the  possibility  that  other  transporters  are  al¬ 
so  down-regulated,  that  this  is  a  mechanism  under¬ 
lying  epigenetic  broad  cross-resistance,  and  that 
agents  that  do  not  require  uptake  into  cells  (e.g.,  anti¬ 
bodies)  could  combine  more  effectively  with  chemo¬ 
therapy  agents  than  do  small  molecules  (which 
require  uptake  into  cells).  ( d )  Decitabine  increases 
CTR1  expression,  suggesting  that  it  should  be  fur¬ 
ther  evaluated  for  its  ability  to  reduce  platinum  resis¬ 
tance. 


(sufficient  to  cause  DNA  hypomethylation)  seemed  to  be  more 
effective  than  higher  cytotoxic  doses  in  leukemia  and  myelo¬ 
dysplasia  (13-15).  Low-dose  decitabine  can  also  restore  hemo¬ 
globin  F  production  in  sickle  cell  anemia  (4,  20). 

Because  low-dose  daily  decitabine  is  effective  in  leukemias, 
we  defined  the  maximum  tolerated  dose  of  this  schedule  in  pa¬ 
tients  with  refractory  solid  tumors  and  lymphomas  and  as¬ 
sessed  decitabine  effect  on  tumor  and  peripheral  blood 
mononuclear  cell  (PBMC)  DNA  methylation  and  on  tumor  ne¬ 
crosis,  apoptosis,  mitoses,  Ki67,  CTR1,  DNMT1,  and  the  tumor 
suppressor  gene  pi 6  (which  may  be  inactivated  by  hypermethy- 
lation;  ref.  21).  Because  drug  delivery  may  vary  with  tissue 
blood  flow  (22)  and  because  tumor  blood  flow  is  more  sensi¬ 
tive  to  blood  pressure  than  is  normal  tissue  blood  flow  (23, 
24),  we  also  assessed  effect  on  decitabine  effect  of  day  1  systolic 
blood  pressure  (SBP).  Dmg  uptake  into  tumors  may  also  vary 
with  tumor  pH  (25).  Hence,  we  also  assessed  effect  of  factors 
that  might  be  related  to  tumor  pH,  including  serum  lactate  de¬ 
hydrogenase  (LDH;  because  LDH5  converts  pyruvate  to  lactate 
in  tumors;  ref.  26),  glucose  (because  administration  of  a  glu¬ 
cose  load  may  reduce  tumor  extracellular  pH;  ref.  27),  C02  (be¬ 
cause  administration  of  a  bicarbonate  load  may  raise  tumor 
pH;  ref.  28),  and  chloride. 

Materials  and  Methods 

Eligibility  criteria  for  this  Institutional  Review  Board-approved 
protocol  included  written  informed  consent,  solid  tumor  or  lympho¬ 
ma  refractory  to  standard  therapy,  biopsiable  tumor,  and  adequate 
organ  function.  Dose-limiting  toxicity  was  defined  as  febrile  neutro¬ 
penia,  grade  4  thrombocytopenia  lasting  >2  wk,  treatment-related 
bleeding,  or  clinically  significant  >grade  3  nonhematologic  toxicity 
occurring  with  the  first  therapy  cycle.  Decitabine  was  supplied  by 
the  National  Cancer  Institute  Division  of  Cancer  Treatment  and  Di¬ 
agnosis  under  a  Collaborative  and  Research  Development  Agreement. 


Patients  received  decitabine  i.v.  over  1  h  daily,  with  >6  patients  per 
cohort.  Cohorts  1,  2,  and  3  received  decitabine  2.5,  5,  and  10  mg/ 
m2/d  days  1  to  5  and  8  to  12  of  each  4-wk  cycle.  Because  substantial 
myelosuppression  (but  less  than  maximum  tolerated  dose)  was  seen 
in  cohort  3,  and  because  updated  data  from  leukemia  studies  (19) 
suggested  that  administration  of  low-dose  decitabine  days  1  to  5  was 
as  effective  as  days  1  to  5  and  8  to  12,  cohorts  4  and  5,  respectively, 
received  15  and  20  mg/m2/d  on  days  1  to  5  only.  Granulocyte  col¬ 
ony-stimulating  factor  was  added  for  cohort  5.  Tumor  biopsies  were 
done  on  all  patients  prior  to  decitabine  and  again  on  cycle  1  day  12 
(within  a  few  hours  of  the  final  cycle  1  dose  for  cohorts  1  to  3  and 
7  d  after  last  decitabine  for  cohorts  4  and  5).  PBMCs  were  collected 
on  the  days  of  tumor  biopsies.  Computed  tomography  scans  to  eval¬ 
uate  tumor  size  were  first  repeated  after  cycle  1. 

Tumors  were  characterized  histopathologically  with  respect  to  tu¬ 
mor  type,  %  necrosis,  number  of  mitoses,  %  of  cells  with  nuclear 
staining  for  Ki67  (29),  and  apoptosis  by  terminal  deoxynucleotidyl 
transferase-mediated  dllTP  nick  end  labeling  (TUNEL)  assay  (30). 
Global  tumor  and  PBMC  DNA  methylation  (%  of  CpG  islands  meth¬ 
ylated)  was  assessed  by  LINE  assays  (31).  Change  in  DNA  methyla¬ 
tion  was  calculated  by  dividing  absolute  change  (day  12  minus  day 
1)  by  the  day  1  value  and  multiplying  by  100.  Promoter  methylation 
for  pl6  and  CTR1  genes  was  assessed  by  pyrosequencing,  as  previ¬ 
ously  described  (32). 


Table  1.  Patient  characteristics 

Patient  characteristic 

No.  of  patients 

Total 

31 

Gender:  male 

17 

Female 

14 

Median  age  (range) 

53  (20-75) 

Tumor  type:  malignant  melanoma 

6 

Renal  cell  carcinoma 

3 

Breast  carcinoma 

4 

Cutaneous  T-cell  lymphoma/ 
mycosis  fungoides 

3 

Thymoma/thymic  carcinoma 

4 

Adenocystic  carcinoma 

2 

Head  &  neck  squamous  carcinoma 

2 

Neuroendocrine  carcinomas 

2 

Desmoplastic  tumor 

1 

Other  carcinomas 

4 

Dose  Level  (mg/m2/d):  2.5  x  10  d 

6 

5  x  10  d 

6 

10  x  10  d 

7 

15  x  5  d 

6 

20  x  5  days 

Months  from  last  therapy:  median  (range) 

6 

Last  cytotoxic  therapy 

3  (1-31) 

Last  platinum  (n  =  22) 

9.5  (1.5-50) 

Last  cytotoxic  or  targeted  therapy 

2(1-18) 

No.  of  prior  systemic  regimens: 
median  (range) 

5(1-14) 

No.  of  prior  targeted  agents: 
median  (range) 

No.  of  patients  previously  treated  with 
targeted  agents: 

2  (0-6) 

Thalidomide 

5 

Bevacizumab 

7 

Interferon  a 

9 

EGFR  inhibitor  (gefitinib,  erlotinib, 
cetuximab,  PKI-166) 

8 

Histone  deacetylase  inhibitors 

3 

No.  of  other  prior  targeted  agents 

21 
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Table  2.  Grade  >3  toxicity  with  first  cycle  decitabine 

Decitabine  mg/m2/d  x  no.  days 

2.5  x  10  5  x  10 

10  x  10 

15  x  5 

20  x  5 

No.  of  cycles 

6  6 

7 

6 

6 

Toxicity: 

No.  of  cycles 

with  toxicity 

Neutropenia:  grade  3 

2 

1 

2 

1 

Grade  4 

5 

3 

5 

Febrile  neutropenia  grade  3 

2 

Nonneutropenic  infection  grade  3 

1 

1 

Platelets:  grade  3 

1 

Grade  4 

1 

1 

Anemia  grade  3 

1 

1 

Fatigue/lPhosphate 

1 

Fatigue  grade  3 

1 

Hyperglycemia* 

2 

Renal  vein  thrombus* 

1 

*Probably  unrelated. 

For  immunohistochemistry,  5-pm-thick  formalin-fixed  and  paraf¬ 
fin-embedded  tumor  tissue  sections  were  deparaffinized  and  hydrat¬ 
ed.  Sections  were  stained  using  mouse  antibodies  for  Ki6 7 
(monoclonal,  clone  MIB1;  dilution,  1:200;  90-min  incubation  at 
room  temperature;  Dako,  Inc.),  CTR1  (polyclonal;  dilution,  1:400; 
90-min  incubation  at  room  temperature;  Gene  Tex,  Inc.),  DNMT1 
(polyclonal;  dilution,  1:100;  90-min  incubation  at  room  temperature; 
Santa  Cmz  Biotechnology),  and  pi 6  (monoclonal,  clone  JC8;  dilu¬ 
tion,  1:50;  60-min  incubation  at  37 °C;  Lab  Vision  Co.).  As  second¬ 
ary  antibody,  Envision  Plus  Dual  Link-labeled  polymer  (Dako,  Inc.) 
was  used.  Apoptosis  was  studied  using  TUNEL  assay  (Promega  Co.) 
according  to  manufacturer  recommendations,  but  the  diaminobenzi- 
dine  reaction  was  stopped  at  3  min. 

Cytoplasmic  CTR1,  DNMT1,  and  pi 6  expression  was  quantified  us¬ 
ing  a  4-value  intensity  score  (0-3+) .  The  cytoplasmic  expression  score 
(range,  0-300)  was  then  obtained  by  multiplying  the  intensity  score 
by  the  percent  tumor  cells  staining.  Nuclear  CTR1,  DNMT1,  and  Ki67 
expression  was  reported  as  the  percentage  of  positive  nuclei  among  tu¬ 
mor  cells  assessed.  For  TUNEL  assessment,  we  counted  the  number  of 
positive  apoptotic  cells  plus  apoptotic  bodies  in  10  high  power  fields 
(x40;  ref.  30).  Changes  in  these  scores  and  in  number  of  mitoses  per 
high  power  field,  %  necrosis,  and  apoptosis  were  calculated  by  subtract¬ 
ing  day  1  values  from  day  12  values  (baseline  0  values  precluded  cal¬ 
culation  as  %  changes).  Assessments  were  blinded  with  respect  to  dmg 
dose,  cohort,  and  %  DNA  methylation. 

GraphPad  Prism  5.0  was  used  for  statistical  calculations  using  two- 
tailed  nonparametric  tests  (Spearman  tests  for  correlations,  Wilcoxon 
signed-rank  tests  for  paired  comparisons,  and  Mann-Whitney  and  Krus- 
kal-Wallis  tests  for  comparison  of  two  groups  or  more  than  two  groups, 
respectively).  Small  sample  size  precluded  multivariate  analyses.  For  di- 
chotomization  of  continuous  variables,  cut-points  were  chosen  arbi¬ 
trarily  by  inspection  of  the  data  to  try  to  maximize  differences 
between  higher  and  lower  value  groups. 

Results 

The  trial  accrued  31  patients  from  September,  2004  to  March, 
2007.  Patient  demographics  are  outlined  in  Table  1  and  first- 
cycle  toxicity  in  Table  2.  First-cycle  dose-limiting  febrile  neutro¬ 
penia  developed  in  two  of  six  cohort  five  patients.  Febrile  neu¬ 
tropenia  also  developed  during  a  later  cycle  of  therapy  in  one 
patient  in  cohort  2.  The  dose  recommended  for  phase  II  trials  is 
10  mg/m2/d  days  1  to  5,  and  8  to  12  or  15  mg/m2/d  days  1  to  5 
(with  granulocyte  colony-stimulating  factor). 


Time  to  progression  and  response.  Median  time  to  progression 
(TIP)  was  7.1  (range,  4  to  29)  weeks.  There  was  one  partial  re¬ 
mission  (thymoma).  At  first  planned  re-evaluation  at  4  weeks, 
17  of  28  evaluable  patients  had  disease  stability  (including  3 
minor  responses  in  cutaneous  T-cell  lymphoma,  malignant 
melanoma,  and  appendiceal  adenocarcinoma,  respectively). 
Tumor  characteristics  were  similar  in  patients  with  partial  or 
minor  responses  versus  stability  or  progression.  Fifteen  patients 
received  1  decitabine  cycle,  10  received  2,  and  6  received  3  to  7 
cycles.  Tumor  growth  was  the  most  frequent  reason  for  therapy 
discontinuation. 

Cohorts  4  to  5  had  a  shorter  median  TTP  (4.5  versus  10 
weeks;  P  =  0.02)  and  a  trend  toward  greater  increase  in  tumor 
size  with  first  cycle  (12.6%  versus  7.0%;  P  =  0.17)  than  cohorts 
1  to  3.  Neither  outcome  correlated  significantly  with  decitabine 
dose  (r  =  -0.24  and  r  =  0.22,  respectively)  or  with  tumor  char¬ 
acteristics,  although  there  were  slight  trends  toward  TTP  corre¬ 
lations  with  tumor  type  (median  of  6.0  weeks  for  epithelial 
tumors  versus  8.4  weeks  for  others;  P  =  0.10),  number  of  mito¬ 
ses  (r  =  -0.34;  P  =  0.09),  apoptosis  score  (r  =  0.34;  P  =  0.17), 
and  DNMT1  score  (r  =  -0.32;  P  =  0.12)  and  toward  tumor  size 
change  correlations  with  mitoses  (r  =  0.32;  P  =  0.13)  and  CTR1 
score  (r  =  0.40;  P  =  0.07). 

Time  from  last  prior  therapy  and  predecitahine  tumor  charac¬ 
teristics.  Patients  with  shorter  times  from  last  prior  cytotoxic  or 
targeted  therapies  had  a  trend  toward  higher  predecitahine  tu¬ 
mor  DNA  methylation  (55.2%  for  <  3  months  from  last  therapy 
versus  39.6%  for  >  3  months,  P  =  0.07;  Fig.  1A).  They  also 
tended  to  have  higher  apoptosis  scores  (r  =  -0.44;  P  =  0.07) 
and  had  significantly  higher  pi 6  scores  (Fig.  IB).  Patients  with 
last  therapy  <3  months  before  decitabine  had  lower  predecita- 
bine  CTR1  scores  than  did  those  >3  months  from  last  therapy 
(median  score  90  versus  285;  P  =  0.02).  CTR1  scores  correlated 
more  closely  with  time  from  last  targeted  or  cytotoxic  therapy 
(Fig.  1C)  than  with  time  only  from  last  cytotoxic  therapy 
(Fig.  ID),  or  time  from  last  platinum  (r  =  -0.13;  P  =  0.63). 
CTR1  score  was  not  significantly  different  for  patients  who 
had  versus  had  not  received  a  platinum  agent  previously  (medi¬ 
an  scores  100  versus  110;  P  =  0.69).  Time  from  last  therapy  did 
not  correlate  with  number  of  mitoses  (r  =  0.04),  Ki67  positivity 
(r  =  -0.01),  %  necrosis  (r  =  -0.09),  or  DNMT1  score  (r  =  -0.009). 
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A  Pre-decitabine  DNA  methylation 
vs  time  from  last  prior  therapy 


Months  from  last  targeted  or  chemotherapy 

C  Pre  decitabine  CTR1  score  vs  time 
from  last  prior  targeted  or  chemotherapy 


Months  from  last  targeted  or  chemotherapy 


B  Pre-decitabine  pi 6  score  vs  time 
from  last  prior  therapy 


Months  from  last  targeted  or  chemotherapy 

D  Pre-decitabine  CTR1  score  vs  time  from 
last  cytotoxic  chemotherapy 


Months  from  last  chemotherapy 


Fig.  1.  Tumor  characteristics  and  time 
from  last  prior  therapy.  A,  predecitabine 
tumor  DNA  methylation  versus  time 
from  last  prior  cytotoxic  or  targeted 
therapy  (P=  0.07  for  <3  versus  >3  mo). 
B,  predecitabine  pi 6  score  versus  time 
from  last  prior  cytotoxic  or  targeted 
therapy  (r  =  -0.42;  P=  0.04).  C, 
predecitabine  CTR1  score  versus  time 
from  last  prior  cytotoxic  or  targeted 
therapy  (r  =  0.34;  P  =  0.1 1 ;  P  =  0.02  for 
>3  versus  <3  mo).  D,  predecitabine 
CTR1  versus  time  from  last  prior 
cytotoxic  chemotherapy  (r  =  0.22;  P  = 
0.32). 


Effect  of  decitabine  on  tumor  global  DNA  methylation.  In 
paired  comparisons,  there  was  a  significant  reduction  in  global 
tumor  DNA  methylation  with  decitabine,  with  a  median  rela¬ 
tive  reduction  in  tumor  DNA  methylation  of  6%  (mean,  8%; 
range,  52%  decrease  to  24%  increase).  Reduction  in  tumor 
DNA  methylation  was  seen  at  all  decitabine  dose  levels  (medi¬ 
an  relative  decreases  6%,  3%,  2.5%,  and  12%  with  25,  50,  75, 
and  100  mg/m2/cycle,  respectively;  P  =  0.16  for  100  mg/m2/cy- 
cle  versus  others  combined)  and  in  all  cohorts  (median  relative 
decreases  6%,  3%,  15%,  2.5%,  and  4%  in  cohorts  1  to  5,  respec¬ 
tively;  P  =  0.052  for  cohort  3  versus  others  combined;  Fig.  2A). 
Change  in  methylation  did  not  correlate  significantly  with  pre¬ 
decitabine  tumor  characteristics,  although  there  tended  to  be 
greater  reduction  in  methylation  in  tumors  with  more  baseline 
mitoses  (r  =  -0.40;  P  =  0.07). 

Change  in  PBMC  DNA  methylation.  PBMCs  did  not  correlate 
significantly  with  tumors  with  respect  to  either  DNA  methyla¬ 
tion  (predecitabine  and  postdecitabine;  r  =  0.22;  P  =  0.09)  or 
change  in  DNA  methylation  (r  =  0.02;  P  =  0.91).  Change  in  tu¬ 
mor  DNA  methylation  was  less  than  change  in  PBMC  DNA 
methylation  (PBMC  median  change,  -14%;  range,  47%  reduc¬ 
tion  to  23%  increase;  P  =  0.04  for  tumor  versus  PBMC). 

Gender  and  tumor  type.  Tumor  DNA  methylation  was 
52.5%  in  females  versus  47.4%  in  males  (P  =  0.73),  and 
methylation  change  was  -8.4%  versus  -3.3%  in  females  versus 
males  (P  =  0.39).  "Standard"  epithelial  tumors  tended  to 
have  slightly  higher  predecitabine  methylation  than  did  other 
tumor  types  (melanomas,  lymphomas,  thymomas,  neuroen¬ 
docrine  carcinomas,  desmoplastic  tumors;  55.5%  versus 
45.4%;  P  =  0.13),  and  epithelial  tumors  had  less  DNA  meth¬ 
ylation  change  with  decitabine  (median  reduction,  2.6%  ver¬ 
sus  11.2%;  P  =  0.026). 

DNMT1  scores.  Although  cytoplasmic  staining  for  DNMT1 
was  noted  in  16  of  25  evaluable  samples,  nuclear  staining 


was  only  evident  in  8.  Predecitabine,  cytoplasmic  DNMT1 
scores  did  not  correlate  with  DNA  methylation  (Table  3),  and 
changes  in  these  parameters  with  decitabine  also  did  not  corre¬ 
late  (r  =  -0.29).  For  tumors  with  predecitabine  scores  of  >0, 
there  was  a  reduction  in  cytoplasmic  DNMT1  scores  with  deci¬ 
tabine  (Table  4). 

pl6  scores.  Predecitabine  pi 6  score  did  not  correlate  with 
DNA  methylation,  but  did  correlate  with  apoptosis  (Table  3), 
and  was  higher  in  epithelial  tumors  than  in  other  types  (medi¬ 
an  score,  75  versus  0;  P  =  0.01).  Change  in  pi 6  score  did  not 
correlate  with  change  in  DNA  methylation  (r  =  0.04),  and  dec¬ 
itabine  had  little  effect  on  pi 6  scores  (Table  4).  Only  1  of  12 
tumors  initially  negative  for  pi 6  converted  to  positive.  Predeci¬ 
tabine  pi  6  promoter  methylation  was  less  than  or  equal  to  the 
background  noise  level  (10%)  in  25  of  28  evaluable  tumors. 
The  3  with  higher  baseline  levels  went  from  46%  to  37%, 
27%  to  23%,  and  16%  to  47%,  respectively,  with  pi 6  remain¬ 
ing  undetectable  in  the  first  2,  and  going  from  10/300  to  unde¬ 
tectable  in  the  third. 

CTR1  score.  CTR1  staining  was  predominantly  cytoplasmic, 
with  nuclear  staining  identified  in  only  six  predecitabine  and 
five  postdecitabine  samples.  CTR1  scores  were  slightly  lower 
in  epithelial  tumors  than  in  other  tumor  types  (median 
score,  80  versus  125;  P  =  0.15).  Predecitabine  CTR1  score 
was  significantly  higher  in  tumors  with  >7  versus  <7  mitoses 
(Table  3).  In  paired  comparisons,  CTR1  scores  increased  sig¬ 
nificantly  with  decitabine  when  initial  scores  were  <200  and 
in  patients  starting  decitabine  <3  months  after  last  prior  ther¬ 
apy  (Table  4;  Fig.  2B  and  C).  CTR1  change  did  not  vary  sig¬ 
nificantly  with  dose  or  cohort.  Although  there  was  a  strong 
inverse  correlation  between  CTR1  score  and  methylation  (Ta¬ 
ble  3;  Fig.  2D),  change  in  CTR1  score  did  not  correlate  sig¬ 
nificantly  with  change  in  methylation  (r  =  0.23),  and  CTR1 
promoter  methylation  was  <10%  (background  noise  level)  in 
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all  evaluable  tumor  samples  (24  predecitabine,  26  postdecita- 
bine)  and  in  all  36  cell  lines  tested. 

DNA  methylation  and  markers  of  proliferation  and  cell  death, 
tumor  size  change,  and  TTP.  Mitoses  and  Ki67  tended  to  in¬ 
crease  with  decitabine  in  tumors  in  which  they  were  initially 
low  and  apoptosis  increased  significantly,  but  necrosis  did 
not  change  (Table  4).  Predecitabine  DNA  methylation  did 
not  correlate  with  mitoses,  Ki67,  %  necrosis,  or  apoptosis 
(Table  3),  and  also  did  not  correlate  with  tumor  size  change 
over  first  therapy  cycle  (r  =  -0.20)  or  with  TTP  (r  =  0.20). 
Similarly,  change  in  DNA  methylation  did  not  correlate  sig¬ 
nificantly  with  change  in  any  of  mitoses  (r  =  0.07),  Ki67  (r  = 
0.16),  apoptosis  (r  =  0.002),  necrosis  (r  m  -0.38;  P  =  0.08), 
or  tumor  size  (r  =  -0.26)  or  with  TTP  (r  =  0.29;  P  =  0.20). 
Despite  the  lack  of  correlation  of  predecitabine  methylation 
and  methylation  change  with  these  factors,  postdecitabine 
methylation  did  correlate  with  postdecitabine  mitoses  (r  = 
-0.56;  P  =  0.002)  and  Ki67  (r  =  -0.43;  P  =  0.04),  with  a  trend 
to  an  association  with  TTP  (r  =  0.29;  P  =  0.15). 

Blood  pressure  and  pH  factors.  Patients  with  SBP  <120  versus 
>120  mm  Hg  had  significantly  higher  predecitabine  tumor  ne¬ 
crosis  (median,  40%  versus  15%;  P  =  0.03).  For  SBP  <140  ver¬ 
sus  >140  mm  Hg,  first  cycle  tumor  size  change  was  7.5%  versus 
15.6%  (P  =  0.03),  whereas  DNA  methylation  change  was  -3.3% 
versus  -11.6%  (P  =  0.10).  If  SBP  is  multiplied  by  dose/cycle  (be¬ 
cause  both  higher  SBP  and  higher  dose  might  increase  tumor 
exposure  to  drug),  DNA  methylation  change  was  -12%  versus 
-3%  for  patients  with  values  >9,000  versus  <9,000  (P  =  0.04). 
There  was  a  trend  toward  a  greater  reduction  in  DNA  methyla¬ 
tion  with  higher  predecitabine  serum  LDH  (r  =  -0.38;  P  = 
0.058),  possibly  because  LDH  also  correlated  with  predecita¬ 
bine  mitoses  (r  =  0.40;  P  =  0.04).  There  were  no  correlations 
of  interest  between  serum  glucose,  chloride  or  C02,  and 
changes  in  DNA  methylation. 


Discussion 

This  daily  x5  to  10  decitabine  regimen  was  well  tolerated  in 
very  heavily  pretreated  solid  tumor  and  lymphoma  patients. 
Neutropenia  was  dose  limiting.  Decitabine  reduced  global 
DNA  methylation,  particularly  in  nonepithelial  tumors  in  this 
study.  In  leukemias,  dose-response  curves  for  decitabine-in- 
duced  demethylation  flatten  at  higher  doses  (33).  Although 
the  relationship  between  dose  and  effect  was  not  statistically 
significant  in  our  study,  trends  to  increased  demethylation  with 
both  higher  decitabine  doses  and  with  higher  SBP  [which  might 
augment  tumor  blood  flow  (23,  24),  thereby  enhancing  drug 
delivery;  ref.  22]  suggest  a  dose-response  effect,  as  previously 
suggested  for  6-hour  decitabine  infusions  in  solid  tumors 
(16).  However,  methylation  decreased  even  with  lowest  doses 
tested,  and  the  changes  we  noted  at  lower  doses  were  compara¬ 
ble  with  those  previously  reported  at  higher  doses  using  6-hour 
decitabine  infusions  (16). 

In  our  study,  we  saw  a  median  relative  reduction  in  tumor 
DNA  methylation  of  6%  (mean,  8%;  range,  52%  decrease  to 
24%  increase),  with  a  median  12%  decrease  in  methylation 
at  the  highest  dose  tested  (100  mg/ m2/ cycle).  In  comparison, 
using  a  single  6-hour  i.v.  infusion  of  decitabine  in  combination 
with  carboplatin,  Appleton  et  al.  (16)  noted  a  mean  demethy¬ 
lation  of  the  MAGE  1 A  promoter  of  3.5%  in  tumor  at  their  dec¬ 
itabine  maximum  tolerated  dose  of  90  mg/m2/cycle,  with  a 
maximal  demethylation  of  6.8%.  Although  we  reported  a  rela¬ 
tive  change  in  %  DNA  methylation,  it  was  not  clear  whether 
Appleton  et  al.  (16)  were  reporting  a  relative  or  an  absolute 
change  in  methylation.  Aparicio  et  al.  (34)  administered  deci¬ 
tabine  20  to  40  mg/m2  to  solid  tumor  patients  as  a  72-h  con¬ 
tinuous  i.v.  infusion,  and  detected  decreased  promoter 
methylation  for  some  genes  in  some  patients,  but  they  did 
not  report  the  degree  of  demethylation.  Furthermore,  in  their 


Fig.  2.  Change  in  tumor  characteristics 
with  decitabine.  A,  change  in  tumor  % 
DNA  methylation  versus  cohort  (across 
all  cohorts,  P=  0.40;  cohort  3  versus 
combined  others:  median,  -15%  versus 
-3%;  P  =  0.052).  B ,  predecitabine  versus 
postdecitabine  CTR1  score  for 
predecitabine  CTR1  of  <100  (median, 

0  versus  90;  paired  P=  0.02,  increase  in 
8  of  10).  C,  change  in  CTR1  score  with 
decitabine  for  patients  with  last  prior 
therapy  <3  versus  >3  mo  before 
decitabine  (median,  30  versus  0; 

P=  0.03).  D,  tumor  CTR1  score  versus  % 
DNA  methylation  predecitabine  and 
postdecitabine  (r  =  -0.41;  P=  0.005). 
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study,  methylation  was  instead  increased  in  some  patients,  and 
change  in  methylation  did  not  correlate  with  decitabine  dose 
(34).  Schrump  et  al.  (35)  also  gave  decitabine  60  to  90  mg/ 
m2  as  a  continuous  i.v.  infusion  to  solid  tumor  patients,  and 
postdecitabine,  they  noted  increased  expression  of  75  genes 
but  decreased  expression  of  324  genes,  and  they  did  not  report 
%  change  in  DNA  methylation.  Overall,  the  available  data  from 
our  study  and  others  suggests  that  decitabine  is  able  to  reduce 
DNA  methylation  in  solid  tumors.  The  effect  of  dose  is  unclear, 
but  our  data  and  Appleton's  data  suggest  that  higher  doses 
within  the  range  tolerated  may  have  a  greater  effect.  The  effect 
of  schedule  of  drug  administration  is  also  unclear.  There  is  no 
indication  that  our  daily  1-h  infusion  schedule  was  any  less  ef¬ 
fective  than  more  prolonged  administration  schedules. 

PBMC  DNA  methylation  was  not  a  reliable  surrogate  for  tu¬ 
mor  methylation.  We  and  others  (16)  found  a  greater  decita¬ 
bine  effect  on  methylation  in  PBMCs  than  in  tumors, 
possibly  due  to  differences  in  kinetics  or  drug  accessibility. 
DNA  synthesis  is  required  for  decitabine  incorporation  into 
DNA,  for  DNMT  entrapment  and  for  DNA  demethylation 
(36),  in  keeping  with  the  trend  noted  toward  greater  demethy¬ 
lation  in  tumors  with  more  predecitabine  mitoses. 

Decitabine  promotes  proteosomal  degradation  of  DNMT 
(18),  and  DNMT1  was  decreased  in  tumors  in  which  it  was 
detectable  predecitabine  by  our  immunohistochemistry  meth¬ 
ods,  although  changes  in  DNMT1  and  methylation  did  not 
correlate.  The  previously  reported  proteosomal  degradation 
of  DNMT1  in  cell  lines  was  seen  predominantly  in  the  cell 
nucleus  (18),  although  we  found  mainly  cytoplasmic  changes 
in  DNMT1  in  our  study.  We  are  unaware  of  any  other  clinical 
assessments  of  effect  of  decitabine  on  DNMT1  expression,  by 
immunohistochemistry.  Because  the  functional  role  of 
DNMT1  is  within  the  cell  nucleus,  it  is  unclear  whether  the 
changes  in  cytoplasmic  expression  of  DNMT1  we  detected 
are  of  any  biological  significance. 

In  secondary  exploratory  analyses  (which  should  be  inter¬ 
preted  cautiously  in  light  of  small  patient  numbers,  population 
heterogeneity,  multiplicity  of  analyses,  and  use  of  semiquantita- 
tive  immunohistochemistry),  mitoses  and  Ki67  tended  to  in¬ 
crease  with  decitabine  when  initially  low,  postdecitabine 
mitoses,  and  Ki67  correlated  inversely  with  postdecitabine  meth¬ 
ylation,  and  TTP  tended  to  be  shorter  with  low  postdecitabine 
methylation,  suggesting  that,  although  decitabine  increases  apo¬ 
ptosis,  it  may  also  enhance  proliferation  by  up-regulating  pro¬ 


growth  signaling  pathways.  The  association  of  shorter  TTP  with 
schedule  could  possibly  indicate  that  effect  on  proliferation  var¬ 
ies  with  decitabine  schedule,  although  patient  selection  might  al¬ 
so  account  for  this.  Hence,  decitabine  might  be  better  used  in 
combination  with  other  agents  as  a  potential  resistance  modula¬ 
tor  rather  than  being  used  alone  in  solid  tumors. 

Platinum-resistant  cell  lines  may  have  reduced  CTR1  (a  cop¬ 
per  transporter  that  plays  a  role  in  cellular  platinum  uptake;  ref. 
37)  and  multiple  other  membrane  transporters  (9,  10),  and 
decitabine  may  up-regulate  some  transporters  in  platinum-re¬ 
sistant  cells  (9).  We  hypothesized  that  the  dose-response  curve 
flattening  seen  at  higher  chemotherapy  doses  in  non-small  cell 
lung  cancer  and  other  malignancies  could  be  explained  in  part 
by  down-regulation  and  saturation  of  factors  required  for  drug 
efficacy,  including  various  transporters  (7).  Here,  we  found 
that,  compared  with  tumors  not  recently  treated,  tumors  treated 
recently  with  any  cytotoxic  or  targeted  therapy  had  significantly 
less  CTR1  but  increased  pi 6  and  trends  to  increased  methyla¬ 
tion  and  apoptosis.  Although  CTR1  promoter  was  not  hyper- 
methylated,  there  was  a  strong  negative  correlation  between 
global  DNA  methylation  and  CTR1  score,  and  administration 
of  decitabine  (which  activates  gene  expression  through  me¬ 
chanisms  both  dependent  on  and  independent  of  promoter  hy- 
pomethylation;  refs.  2,  4)  significantly  increased  CTR1  score  for 
those  with  initial  scores  of  <200  and  for  patients  who  had  re¬ 
ceived  their  last  prior  therapy  <3  months  earlier.  Hence,  DNA 
hypermethylation  may  play  an  indirect  role  in  decreasing  CTR1 
expression  (for  example,  by  decreasing  cell  proliferation),  and 
decitabine  may  be  effective  at  increasing  its  expression  by  up- 
regulating  expression  of  factors  that  in  turn  promote  CTR1  ex¬ 
pression.  We  are  currently  also  assessing  expression  of  other 
transporters  in  these  tumors.  Alternatively,  it  remains  possible 
that  the  increase  in  CTR1  is  a  more  nonspecific  effect  of  chemo¬ 
therapy  administration  and  that  agents  with  other  mechanisms 
of  action  would  also  increase  CTR1  expression.  Against  this 
possibility  is  the  observation  that  CTR1  expression  increased 
with  increasing  time  from  last  therapy  with  other  agents. 

In  keeping  with  published  cell  line  (9)  and  xenograft  (8) 
data,  our  observations  suggest  a  potential  role  for  decitabine 
as  a  resistance  modulator  in  tumors  with  reduced  transpor¬ 
ters.  Although  other  dose-schedules  have  been  ineffective 
(38,  39),  combining  multiple  day  decitabine  administration 
with  platinums  in  chemonaive  patients  could  be  of  interest. 
DNA  synthesis  (during  either  cell  division  or  DNA  repair)  is 


Table  3.  Spearman  coefficients  for  correlations  between  predecitabine  tumor  characteristics 


n 

P16  score 

CTR1  score 

DNMT1 

score 

Tumor  %  DNA 
methylation 

Apoptosis 

score 

% 

necrosis 

%  Ki67 
positive 

No.  of  Mitoses 

26 

-0.14 

0.39*  (P  =  0.06) 

0.17 

0.004 

-0.13 

0.17 

0.14 

Ki67%  positive 

23 

0.26 

0.19 

0.04 

0.07 

-0.04 

-0.36  (P=  0.09) 

%  necrosis 

27 

-0.22 

-0.08 

-0.25 

-0.20 

-0.18 

Apoptosis  score 

18 

0.53  (P  =  0.03) 

0.03 

-0.21 

0.12 

Tumor  %  DNA  methylation 

27 

0.21 

-0.45  (P=  0.04) 

-0.15 

DNMT1  score 

25 

0.02 

0.34  (P=  0.11) 

CTR1  score 

23 

0.05 

P16  score 

25 

NOTE:  Only  P  values  <0.20  are  shown. 

*Predecitabine  CTR1  score  was  significantly  higher  in  tumors  with  >7  mitoses  than  in  those  with  <7  (200  vs  85,  P  =  0.02). 
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Table  4.  Postdecitabine  vs  predecitabine  tumor  characteristics 


Factor 

n* 

Predecitabine 

Median  Range 

Postdecitabine 

Median  Range 

Wilcoxon  signed 
rank  (paired)  P 

No.  of  mitoses: 

All 

24 

2.0 

0-15 

3.5 

0-28 

0.12 

If  predecitabine  mitoses  <7 

18 

1.5 

0-7 

3.0 

0-28 

0.02 

%  tumor  cells  expressing  Ki67: 

All 

20 

20 

0-70 

22.5 

0-85 

0.53 

If  predecitabine  Ki67  <25 

14 

7.5 

5-25 

12.5 

5-75 

0.15 

%  of  tumor  that  is  necrotic: 

All 

25 

20 

0-90 

20 

0-70 

0.32 

If  predecitabine  necrosis  >30% 

8 

65 

40-90 

35 

0-70 

0.02 

If  predecitabine  necrosis  <30% 

17 

5 

0-30 

5 

0-50 

0.30 

Apoptosis  score 

17 

22 

2-230 

49 

0-269 

0.049 

%  DNA  methylated: 

Tumor 

25 

51.2 

21.1-65.4 

43.7 

17.7-67.9 

0.01 

Blood 

30 

65.3 

39.5-74.1 

56.0 

33.8-64.5 

<0.001 

DNMT1  cytoplasmic  score: 

All 

23 

30 

0-280 

50 

0-180 

0.56 

If  predecitabine  DNMT1  >0 

16 

120 

20-280 

95 

0-180 

0.04 

DNMT1  nuclear  score 

All 

23 

0 

0-110 

0 

0-40 

0.50 

If  predecitabine  DNMT1  >0 

7 

10 

5-110 

5 

0-40 

0.50 

CTR1  score: 

All 

21 

100 

0-300 

95 

0-300 

0.22 

If  predecitabine  CTR1  <200 

15 

75 

0-160 

90 

0-255 

0.025 

If  predecitabine  CTR1  <100 

9 

0 

0-80 

90 

0-100 

0.02 

If  <  3  months  from  last  therapy 

17 

80 

0-270 

95 

0-270 

0.04 

P16  score 

24 

5 

0-300 

0 

0-300 

0.96 

%  of  tumor  that  is  stroma 

26 

40 

0-90 

50 

1-90 

0.97 

%  of  tumor  that  is  fibrosis 

26 

30 

0-60 

35 

0-80 

0.49 

^Number  for  which  both  the  predecitabine  and  postdecitabine  value  is  known  and  the  predecitabine  value  satisfies  any  criteria  specified  in  left 
column. 


required  for  decitabine-induced  hypomethylation  (36).  Plati¬ 
num  binding  to  DNA  generates  DNA  repair  (6).  Hence,  pla¬ 
tinums  could  potentiate  DNA  demethylation  by  augmenting 
decitabine  incorporation  into  DNA,  whereas  DNA  demethyla¬ 
tion  could  potentially  inhibit  emergence  of  resistance  to  the 
platinum. 

Decitabine  may  also  augment  epidermal  growth  factor  recep¬ 
tor  (EGFR)  expression  and  restore  sensitivity  to  EGFR  inhibitors 
(40),  suggesting  a  role  for  decitabine  in  reversing  some  types  of 
acquired  resistance  to  EGFR  inhibitors.  Furthermore,  our  obser¬ 
vation  here  that  CTR1  expression  may  be  reduced  by  recent  ex¬ 
posure  to  targeted  therapies  may  help  explain  why  addition  of 
small  molecule  EGFR  inhibitors  to  chemotherapy  in  non-small 
cell  lung  cancer  adds  little  (41,  42),  whereas  addition  of  anti- 
EGFR  antibodies  to  chemotherapy  may  improve  outcome 
(43).  Cellular  uptake  of  small  molecules  could  hypothetically 
be  reduced  by  down-regulation  of  membrane  transporters, 
whereas  antibodies  would  not  require  cellular  uptake. 

Correlation  of  low  SBP  with  increased  tumor  necrosis  is  in 
keeping  with  tumor  blood  flow  being  particularly  sensitive  to 
SBP  (23,  24).  The  additional  observations  that  high  SBP  corre¬ 


lated  with  greater  tumor  growth  with  first  decitabine  cycle,  but 
decitabine-induced  demethylation  was  greater  with  increased 
SBP  (possibly  through  improved  drug  delivery)  suggest  testing 
of  a  strategy  to  maintain  SBP  at  low  levels  between  chemother¬ 
apy  cycles  but  to  adjust  medications  to  promote  high  SBP  dur¬ 
ing  chemotherapy  administration  and  distribution. 

Although  the  effect  of  decitabine  on  DNA  methylation  and 
other  parameters  was  modest,  our  data  support  further  explo¬ 
ration  of  decitabine  as  a  resistance-modulating  agent.  Patients 
most  likely  to  benefit  may  be  those  most  recently  treated 
with  other  agents  and  those  with  lowest  expression  of  drug 
transporters. 
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Abstract 

Nanoparticle  quantum  dots  (QDs)  are  ideal  materials  for  multiplexed  biomarker  detection, 
localization,  and  quantification.  Both  direct  and  indirect  methods  are  available  for  QD- 
based  immunohisto fluorescence  (QD-IHF)  staining;  however,  the  direct  method  has  been 
considered  laborious  and  costly.  In  this  study,  we  optimized  and  compared  the  indirect 
QD-IHF  single  staining  procedure  using  QD-secondary  antibody  conjugates  and  QD- 
streptavidin  conjugates.  Problems  associated  with  sequential  multiplex  staining  were 
identified  quantitatively.  A  method  using  a  QD  cocktail  solution  was  developed  allowing 
simultaneous  staining  with  three  antibodies  against  E-cadherin,  EGFR,  and  P-catenin  in 
formalin- fixed  and  paraffin-embedded  (FFPE)  tissues.  The  expression  of  each  biomarker 
was  quantified  and  compared  using  the  cocktail  and  the  sequential  method.  Our  results 
demonstrated  that  the  QD  signal  for  each  multiplexed  biomarker  was  more  consistent  and 
stable  using  the  cocktail  method  than  the  sequential  method,  providing  a  unique  tool  for 
potential  research  and  clinical  applications. 
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1 .  Introduction 

In  recent  years,  nanotechnology  has  rapidly  developed  and  is  used  in  molecular 
detection,  imaging,  diagnostics,  and  therapeutics  in  the  cancer  field  [1,  2].  Quantum  dots 
(QDs)  are  nanoscale  particles  made  from  inorganic  semiconductors  that  can  produce 
different  fluorescence  signals  depending  on  their  size  and  components.  QDs  have 
superior  signal  brightness,  photo  stability,  relatively  long  excited-state  lifetimes,  and 
optimized  signal-to-background  ratios  compared  with  organic  dyes  [3].  QDs  can  be 
covalently  linked  to  biological  molecules  such  as  peptides,  proteins,  and  nucleic  acids,  as 
well  as  streptavidin  [4,  5].  Due  to  their  long  excitation  and  narrow  emission  spectra,  QDs 
can  be  excited  simultaneously  through  one  appropriate  excitation  source.  Together  these 
properties  render  QDs  ideal  for  multiplexed  biological  imaging.  They  have  been  used  for 
both  molecular  and  cellular  labeling  [3-7]. 

Many  researchers  reported  that  QDs  can  immunostain  more  than  three  biomarkers  in 
formalin- fixed  paraffin-embedded  (FFPE)  tissues  using  QD-based  immunohisto- 
fluorescence  (QD-IHF)  [8-11].  To  date,  several  different  staining  procedures  have  been 
utilized,  including  direct  and  indirect  staining,  such  as  QDs  linked  to  primary  antibody 
and  QDs  linked  to  secondary  antibody  or  streptavidin,  respectively  [9,  10,  12,  13]. 
Although  the  direct  staining  method  (QDs  linked  directly  to  a  primary  antibody)  is 
straightforward,  some  primary  antibodies  may  not  survive  the  QD  conjugation  process. 
The  conformation  and  function  of  the  primary  antibody  may  be  changed  and  its  binding 
properties  are  likely  altered  by  covalent  modifications  at  either  -NFE  or  -COOH  sites  [9, 
14].  Furthermore,  the  reagent  costs  are  considerable  because  each  conjugation  reaction 
requires  up  to  300pg  antibody  (Invitrogen  protocol)  and  the  yield  of  QD-antibody- 
conjugates  is  usually  low.  Since  each  primary  antibody  is  covalently  conjugated  to  just 
one  type  of  QD,  changing  antibody  for  a  certain  QD  probe  is  not  possible  once  the 
conjugation  is  completed.  Many  researchers  have  abandoned  the  direct  staining  method 
since  these  problems  can  be  avoided  by  indirect  QD  staining  methods. 

The  main  advantages  of  indirect  QD  staining  are  its  flexibility,  lower  costs,  and  the 
reduced  constraints  on  primary  antibodies.  Although  many  studies  have  described 
detailed  protocols  for  tissue  specimen  preparation,  multicolor  QD  staining,  and  image 
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processing  [8,  9,  15],  they  did  not  provide  detailed  discussion  and  quantitative  analysis  in 
optimizing  their  multiplexed  biomarker  staining  procedures.  In  this  study,  we  compared 
multiple  QD  staining  in  a  sequential  order  with  that  in  a  simultaneous  combination  while 
using  different  methods,  QD-secondary  antibody  conjugates  and  QD-streptavidin 
conjugates,  and  quantitatively  evaluated  these  staining  methods  for  each  of  the  tested 
biomarkers. 

2.  Materials  and  methods 

2.7.  Materials. 

Mouse  anti- human  E-cadherin  (E-cad)  was  purchased  from  BD  Bio  sciences 
(Franklin  Lakes,  NJ,  USA),  rabbit  anti-human  epidermal  growth  factor  receptor  (EGFR) 
was  from  BioGenex  (San  Ramon,  CA,  USA),  and  goat  anti-human  P-catenin  was  from 
R&D  Systems  (Minneapolis,  MN,  USA).  All  of  the  primary  antibodies  were  diluted  with 
antibody  diluents  (Dako,  Carpinteria,  CA,  USA).  QD-secondary  antibody  conjugates 
(QD-2nd  Ab)  and  QD-streptavidin  conjugates  ( QD-streptavidin ):  Qdots®565  goat  F(ab’)2 
anti-mouse  IgG  conjugates,  Qdots®  605  goat  F(ab’)2  anti-rabbit  IgG  conjugates,  Qdots® 
655  rabbit  F(ab’)2  anti-goat  IgG  conjugates,  and  Qdots®  streptavidin  conjugates  (565,605, 
655nm)  were  bought  from  Invitrogen  (Carlsbad,  CA,  USA)  and  diluted  with  6%  bovine 
serum  albumin  (BSA)  (Sigma,  St.  Louis,  MO)  in  phosphate  buffered  saline  (PBS). 
Ready-to-use  biotinylated  goat  anti-mouse/rabbit/goat  IgG  (biotinylated  2nd  Ab)  was 
obtained  from  Vector  Laboratories  (Burlingame,  CA,  USA).  Spectrofluoremeter  was 
from  QuantaMaster™  UV  VIS,  Photon  Technology  International  (PTI). 

2.2.  Human  tissue  samples. 

Using  an  Institutional  Review  Board-approved  consent  for  tissue  acquisition, 
specimens  for  this  study  were  obtained  from  surgical  specimens  from  patients  who  were 
diagnosed  at  Emory  University  Hospital  with  squamous  cell  carcinoma  of  the  head  and 
neck  (SCCHN),  whose  initial  treatment  was  surgery,  and  who  had  not  received  prior 
treatment  with  radiation  and/or  chemotherapy.  The  clinical  information  on  the  samples 
was  obtained  from  the  surgical  pathology  files  in  the  Department  of  Pathology  at  Emory 
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University  according  to  the  regulations  of  the  Health  Insurance  Portability  and 
Accountability  Act.  After  a  routine  process  to  generate  formalin- fixed,  paraffin- 
embedded  (FFPE)  samples,  the  blocks  were  sectioned  to  4  pm  each  and  mounted  on 
coated  slides.  Each  sample  was  analyzed  by  a  pathologist  after  hematoxylin  and  eosin 
(H&E)  staining. 

2.3.  Single  QD-IHF  staining  with  QD-2nd  Ab  or  QD-streptavidin. 

Before  QD-IHF  staining,  we  confirmed  the  primary  antibodies  were  suitable  for  IHC 
and  selected  FFPE  samples  which  were  strongly  positive  for  staining  of  the  primary 
antibodies  as  positive  control  slides.  Then,  dilution  and  incubation  conditions  for  the 
primary  antibody  and  QD-conjugates  were  optimized  by  quantification  for  QD-IHF 
staining.  (1)  The  QD-IHF  procedure  with  QD-2nd  Ab  was  briefly  as  follows  (shown  in 
cartoon  in  Fig.  1A).  After  deparaffinization  and  rehydration,  antigen  retrieval  was 
performed  using  citric  acid  (10  mM,  pH6.0)  in  microwave  at  95  °C  for  10  min.  The  tissue 
slides  were  blocked  with  5%  normal  goat  serum  (Dako)  for  10  min  before  the  primary 
antibody  incubation  (E-cad  1:  2,000  dilution,  EGFR  1:150  dilution,  or  P-catenin  1:2,000 

dilution)  for  1  hour  at  37°C.  Followed  by  three  washes  with  PBS  (5  min  each),  the  slides 

were  incubated  with  QD  [QD  565  goat  F(ab’)2  anti-mouse  IgG  conjugates,  QD  605  goat 
F(ab’)2  anti-rabbit  IgG  conjugates,  or  QD  655  rabbit  F(ab’)2  anti-goat  IgG  conjugates 

accordingly]  in  6%  BSA  for  1  hour  at  37°C.  After  washing  with  PBS  3  times,  the  nuclei 

were  counterstained  with  4f,6-diamidino-2-phenylindole  (DAPI)  (Invitrogen,  Carlsbad, 
CA,  USA).  The  slides  were  mounted  with  Cytoseal™  60  mounting  medium  (Richard- 
Allan  Scientific,  MI).  (2)  For  QD-IHF  staining  with  QD-streptavidin  (shown  in  cartoon  in 
Fig.  IB),  slides  were  prepared  as  above.  After  the  primary  antibody  incubation,  slides 
were  incubated  with  biotinylated  2nd  Ab  for  20  min  at  room  temperature  (RT),  and 
washed  3  times  with  PBS  (5  min).  Slides  were  incubated  with  QD  565,  QD  605,  or  QD 

655-streptavidin  (1:100)  in  6%  BSA  for  1  hour  at  37°C  and  washed  with  PBS  (5  min)  for 

3  times.  After  nuclei  counterstaining  and  mounting,  the  slides  were  kept  in  the  dark  at 
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4°C  for  visualizing  and  quantifying.  Mouse,  rabbit  or  goat  IgG  was  used  as  a  negative 
control. 

2.4.  Sequential  QD-IHF  staining  with  QD-streptavidin. 

Slide  selection  and  preparation  were  the  same  as  for  single  QD-IHF  staining. 
Primary  antibodies  used  for  sequential  staining  were  as  described  above.  After  the 
primary  antibody  E-cad  incubation  (1:2,000  dilution),  the  slides  were  incubated  with  the 
biotinylated  2nd  Ab  for  20  min  at  RT  and  washed  3  times  with  PBS  (5  min  each).  Slides 

were  then  incubated  with  QD  565-streptavidin  (1:100)  in  6%  BSA  for  1  hour  at  37°C  and 

washed  with  PBS  (5  min  each)  3  times.  After  staining  the  first  biomarker  with  QDs,  the 
staining  procedure  was  repeated  from  the  blocking  step,  except  the  primary  antibody  and 
QD  conjugates  were  replaced  with  EGFR  (1:150)  and  QD  605-streptavidin  (1:100), 
respectively.  Then  the  slides  were  mounted  after  nuclear  counterstaining.  For  QD  signal 
comparison,  we  also  switched  the  staining  sequence  from  EGFR  with  QD  565- 
streptavidin  staining  as  the  first  step  to  E-cad  with  QD  605-streptavidin  staining  as  the 
second.  Mouse  and  rabbit  IgG  was  used  as  a  negative  control. 

2.5.  Multiple  QD-IHF  staining  with  cocktail  or  sequential  method. 

Before  QD-IHF  staining,  slide  selection  was  confirmed  as  strongly  positive  for  E- 
cad,  EGFR,  and  P-catenin  expression.  (1)  For  the  cocktail  staining  method  (shown  in 
cartoon  in  Fig.  4A(i)),  we  chose  primary  antibodies  of  distinct  species  origins,  including 
mouse  anti-human  E-cad,  rabbit  anti-human  EGFR,  and  goat  anti-human  P-catenin. 
Therefore,  for  QD-2nd  Abs,  we  selected  QD  565  goat  F(ab’)2  anti-mouse  IgG,  QD  605 
goat  F(ab’)2  anti-rabbit  IgG,  and  QD655  rabbit  F(ab’)2  anti-goat  IgG,  respectively.  After 
preparation  steps,  the  slides  were  incubated  with  the  three  primary  antibodies  against  E- 

cad  (1:2,000),  EGFR  (1:150),  and  P-catenin  (1:2,000)  simultaneously  for  1  hour  at  37°C. 
After  washing  with  PBS  3  times,  the  three  QD-2nd  Abs  in  a  cocktail  solution  at  1:100 
dilution  were  added  to  the  slides  with  further  incubation  for  1  hour  at  37°C.  Slides  were 
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washed  3  times  in  PBS,  counterstained,  mounted,  and  stored  as  described  above.  (2)  For 
the  sequential  method  (shown  in  cartoon  in  Fig.  4A(ii)),  the  additional  biomarker  P- 
catenin  was  stained  by  incubation  with  QD  655-streptavidin  following  staining  for  E-cad 
with  QD  565-streptavidin  and  EGFR  with  QD  605-streptavidin  as  above.  The  IgG  with 
the  same  host  species  as  the  2nd  Ab  was  used  as  a  negative  control. 

2.6.  QD  spectral  imaging  and  signal  quantification. 

An  Olympus  Microscope  1X71  with  CRi  Nuance  spectral  imaging  and  quantifying 
system  (CRi  Inc.,  Woburn,  MA)  was  used  to  observe  and  quantify  the  QD  signals.  All 
cubed  image  files  were  collected  from  the  FFPE  tissue  slides  at  10-nm  wavelength 
intervals  from  500  to  800  nm  with  an  auto  exposure  time  at  200x  magnification.  Taking 
the  cube  with  a  long  wavelength  bandpass  filter  allowed  transmission  of  all  emission 
wavelengths  above  450  nm.  Both  mixed  and  separated  QD  images  were  established  after 
determining  the  QD  spectral  library  and  unmixing  the  cube.  Background  and  auto¬ 
fluorescence  were  removed  for  accurate  quantification  of  each  QD  signal.  For 
comparison  of  the  QD  signals,  we  defined  the  measurement  threshold  as  the  same.  An 
arbitrary  unit  (a.u.)  was  defined  as  the  average  fluorescence  signal  intensity  per  exposure 
time  (ms),  which  was  obtained  directly  from  the  Nuance  software.  Ten  randomly  selected 
fields  in  each  sample  slide  were  used  for  quantification.  Data  are  presented  as  a  mean  of 
ten  readings  with  standard  deviation  (SD). 

3.  Results 

3.1.  Optimization  of  QD-IHF  single  staining  conditions 

The  quantification  results  were  used  to  evaluate  the  optimized  working  conditions.  It 
was  found  that  (1)  the  same  antigen  retrieval  method  as  used  in  IHC  also  performed  well 
in  QD-IHF  staining  of  FFPE  samples;  (2)  the  optimized  working  conditions  for  primary 
antibodies  in  IHC  also  worked  well  for  QD-IHF;  (3)  incubation  of  the  QD-conjugates 

from  Invitrogen  at  10-20nM,  37°C  for  1  hour  was  sufficient  to  reach  a  balance  of  the 

maximum  staining  effect  with  minimized  non-specific  binding.  Non-specific  binding 
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increased  when  enhancing  the  concentration  and  the  incubation  time  of  QD-conjugates. 
There  was  almost  no  significant  difference  in  the  intensity  of  QD  signal  when  the 
concentration  of  QD-conjugates  reached  20nM,  but  the  non-specific  binding  increased 
directly  (data  not  shown),  suggesting  that  the  QD  binding  was  saturated  at  20nM;  (4) 
multiple  PBS  washing  up  to  three  times  did  not  reduce  the  QD  signal  intensity.  The 
effects  of  other  washing  buffers,  such  as  PBS  with  Tween-20  (PBS-T)  or  Tris-buffered 
saline  with  Tween-20  (TBS-T),  were  similar  to  that  of  PBS. 

3.2.  Comparison  of  QD-IHF  single  staining  with  QD-2nd  Ab  or  QD-streptavidin 

For  indirect  QD-IHF  staining  of  FFPE  tissues,  either  QD-2nd  Ab  or  QD-streptavidin 
(Fig.  1A,  B)  can  be  selected.  To  evaluate  these  two  methods,  we  compared  the  signals 
when  using  the  same  concentration  and  incubation  time  for  both  QD  conjugates.  It  was 
found  that  the  signal  when  staining  with  QD-2nd  Ab  was  lower  than  that  with  QD- 
streptavidin  (Fig.  1C,  D).  The  quantification  results  also  showed  that  the  average 
intensity  from  QD-streptavidin  staining  was  1.36- 1.73-fold  greater  than  that  from  QD-2nd 
Ab  staining  (Fig.  IE). 

3.3.  Comparison  of  QD  signals  at  different  steps  in  QD-IHF  sequential  staining 

To  investigate  whether  the  intensity  of  the  QD  signal  at  the  first  step  changes  or  not 
after  the  following  biomarker  staining  and  many  washing  steps,  we  initially  tested 
sequential  QD-IHF  staining  of  E-cad  with  QD565-streptavidin  followed  by  EGFR  with 
QD605-streptavidin,  and  then  altered  this  sequence.  The  staining  signals  from  the  two 
experiments  were  quantified  and  compared.  It  was  found  that  the  QD  intensity  of  E-cad 
staining  when  stained  first  was  0.104±0.050  compared  with  0.534±0.132  when  stained 
second  (Fig.  2).  Similarly,  the  intensity  of  EGFR  staining  when  stained  first  was 
0.189±0.104  compared  with  0.565±0.098  when  stained  second  (Fig.  2). 

3.4.  Comparison  of  QD-IHF  cocktail  method  with  the  sequential  method 


In  order  to  avoid  the  decrease  in  signal  observed  with  sequential  staining,  we  applied 
three  mixed  primary  antibodies  with  distinct  species  origins  to  the  tissue  slides  and  then 
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incubated  the  relevant  QD  2nd-Abs  in  a  cocktail  solution  in  order  to  make  the  IHF 
staining  efficient  and  simple  [Fig.  3A(i)].  The  level  of  each  QD  signal  obtained  from  the 
cocktail  method  was  quantified  and  compared  to  that  from  the  sequential  method.  It  was 
found  that  each  of  the  QD  signals  obtained  by  the  QD-IHF  cocktail  method  was 
consistent  (Fig.  3B).  The  intensities  of  E-cad,  EGFR  and  P-catenin  were  0.318±0.015, 
0.309±0.034,  and  0.362±0.036,  respectively  (Fig.  3D).  In  contrast,  the  signals  from  the 
sequential  staining  method  were  not  consistent  (Fig.  3C).  Intensities  of  the  second  and  the 
third  signals  were  1.57-2.20-  and  5.80-8.24-fold  higher  than  the  first  signal,  respectively 
(Fig.  3D). 

3.5.  Stability  of  each  QD  in  the  cocktail  solution 

As  recommended  by  the  QD  manufacturer,  Invitrogen  Cooperation,  we  diluted  the 
three  QDs  with  6%  BSA  in  PBS  solution,  and  tested  the  signal  intensity  of  the  QDs  either 
singly  or  in  a  cocktail  solution  using  a  spectrofluoremeter.  Our  study  confirmed  that  the 
QD  signals  in  PBS  appeared  in  the  expected  wavelength  with  reasonable  sensitivity  (Fig. 
4).  The  fluorescence  intensity  of  each  QD  was  not  altered  in  the  cocktail  solution 
compared  to  the  single  QD  solution  (Fig.  4).  Furthermore,  our  study  has  demonstrated 
that  the  intensity  of  each  single  QDs  at  the  same  concentration  was  different  -  in  the 
order  of  QD  655  >  QD  605  >  QD  565  (Fig.  4). 

4.  Discussion 

The  antigen  retrieval  method,  dilution,  and  incubation  condition  of  the  antibody  are 
the  main  factors  that  affect  the  results  of  immuno staining  FFPE  tissues.  There  are  several 
issues  that  should  be  addressed  before  immuno  staining  with  QD-bioconjugates:  (1)  Do 
the  optimized  working  conditions  for  IHC  work  well  for  QD-IHF?  (2)  How  to  control  the 
dilution  ratio  for  QD-conjugates  and  the  incubation  conditions  to  obtain  a  balance 
between  an  optimal  signal  and  minimized  non-specific  binding?  (3)  How  to  optimize  the 
QD-IHF  staining  procedure,  especially  in  multiple  staining?  Most  researchers  use  the 
same  retrieval  method  and  incubation  conditions  for  primary  antibodies  when  conducting 
IHC  and  IHF  stained  with  QDs  as  their  experience.  In  this  study,  after  evaluated 
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systematically  with  quantification  data,  it  was  confirmed  that  the  same  antigen  retrieval 
method,  the  optimized  working  conditions  for  primary  antibodies  in  IHC  also  worked 
well  for  QD-IHF.  For  QD-conjugates  from  Invitrogen,  the  best  concentration  is  at  10- 
20nM.  And  for  washing  buffers,  there  is  no  difference  between  PBS,  PBS-T,  and  TBS-T. 

Either  QD-2nd  Ab  conjugates  or  QD-streptavidin  conjugates  can  be  selected  for 
indirect  QD-IHF  staining.  Although  many  studies  have  described  detailed  protocols  [8,  9, 
15],  they  did  not  provide  detailed  quantitative  analysis  to  compare  these  two  methods.  It 
was  found  that  the  signal  when  staining  with  QD-2nd  Ab  was  lower  than  that  with  QD- 
streptavidin  at  the  same  concentration  and  incubation  time  for  both  QD  conjugates.  The 
staining  with  QD-streptavidin  had  some  amplification  effect.  These  finding  is  similar 
with  others  reports. 

For  multiplex  QD  staining,  the  sequential  staining  method  is  used  by  most 
researchers  [8,  12].  Many  researchers  had  this  question  —  how  about  the  staining  effect  of 
each  QD?  It  was  found  that  the  QD  intensity  when  stained  first  was  lower  than  the  signal 
when  stained  second.  This  result  indicated  that  after  the  initial  biomarker  staining, 
following  the  second  blocking  and  washing  steps,  the  intensity  of  the  first  QD  signal  was 
reduced.  These  problems  should  be  considered  for  multiplex  QD  staining  with  sequential 
method.  In  order  to  achieve  the  best  staining  of  each  biomarker  using  the  QD-IHF 
sequential  method,  theoretically,  the  QD  with  higher  intensity  is  recommended  to  be  used 
at  the  first  step  to  balance  the  decreasing  signal  when  staining  with  QD-IHF  in  a 
sequential  manner. 

In  order  to  avoid  this  problem  in  sequential  staining,  we  investigated  a  new  method  - 
-  selected  three  primary  antibodies  with  distinct  species  origins  and  incubated 
simultaneously  to  the  tissue  slides,  and  then  incubated  the  relevant  QD  2nd-Abs 
conjugates  in  a  cocktail  solution.  It  was  named  QD-IHF  cocktail  method.  After  quantified 
and  compared  with  the  sequential  method,  it  was  found  that  each  of  the  QD  signals 
obtained  by  the  QD-IHF  cocktail  method  was  consistent,  not  like  sequential  method. 

Because  the  properties  of  nanocrystals  are  highly  dependent  on  the  surface 
environment,  it  is  always  a  consideration  whether  the  stability  with  respect  to  the  optical 
emission  peak  maximum  and  color  purity  of  the  QDs  in  such  a  cocktail  solution  may  be 
changed.  After  testing  with  a  spectrofluoremeter,  it  was  confirmed  that  the  fluorescence 
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intensity  of  each  QD  was  not  altered  in  the  cocktail  solution  compared  to  the  single  QD 
solution. 

But  one  of  the  drawbacks  of  the  cocktail  method  is  that  it  can  be  challenging  to  find 
more  than  4  primary  antibodies  with  distinct  species  origins  for  simultaneous  IHF 
staining,  which  limits  the  use  of  this  method  to  for  more  than  4  biomarkers.  In  the  case  of 
multiplexing  more  than  4  biomarkers,  the  cocktail  plus  the  sequential  method  may  be 
applied. 

5.  Conclusion 

In  summary,  we  demonstrated  that  the  signal  intensities  using  the  QD-streptavidin- 
based  staining  method  were  higher  than  those  with  QD-2nd  Ab.  QD  staining  signals  using 
the  cocktail  method  were  more  consistent  and  stable  than  those  obtained  using  the 
sequential  method.  In  order  to  achieve  the  optimal  signal  for  each  biomarker  in  a  QD-IHF 
multiplexed  staining  procedure,  the  staining  method  selection  and  QD  intensity  should  be 
considered. 
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Captions 


Figure  Legends 

Fig.  1  Comparison  of  single  QD-IHF  staining  using  QD-2nd  Ab  with  QD- 
streptavidin.  A.  Cartoon  showing  single  QD-IHF  staining  with  QD-2nd  Ab  conjugates;  B. 
Cartoon  showing  single  QD-IHF  staining  with  QD-streptavidin  conjugates;  C.  RGB 
image  of  E-cad  QD-IHF  staining  with  QD  565-2nd  Ab;  D.  RGB  image  of  E-cad  QD-IHF 
staining  with  QD  565-streptavidin;  E.  Signal  intensity  comparison  between  QD-2nd  Ab 
and  QD-streptavidin. 

Fig.  2  Comparison  of  the  first  signal  with  the  second  signal  in  a  sequential  QD- 
IHF  staining.  A.  E-cad  with  QD  565-streptavidin  as  the  first  bio  marker  and  EGFR  with 
QD  605-streptavidin  as  the  second;  B.  EGFR  with  QD  605-streptavidin  as  the  first 
biomarker  and  E-cad  with  QD  565-streptavidin  as  the  second;  (i)  unmixed  first  signal;  (ii) 
unmixed  second  signal;  ( iii )  quantification  comparison  between  these  two  biomarkers. 

Fig.  3  Comparison  of  the  QD-IHF  cocktail  method  with  the  sequential  method. 
A (i).  Cartoon  showing  cocktail  QD-IHF  staining  with  QD-2nd  Ab  conjugates,  “1” 
illustrates  the  addition  of  different  QDs  simultaneously;  A(ii).  Cartoon  showing 
sequential  QD-IHF  staining  with  QD-streptavidin  conjugates,  “1,  2,  3”  represent  the 
addition  of  QDs  at  different  steps;  B.  Cocktail  QD-IHF  staining  of  E-cad+EGFR+P- 
catenin  with  QD  565+605+655  2nd  Ab-conjugates;  C.  Sequential  QD-IHF  staining  of  E- 
cad,  EGFR,  and  P-catenin  with  QD  565-,  605-  and  655-streptavidin  conjugates 
respectively;  (i)  unmixed  E-cad  (QD  565)  signal;  (ii)  unmixed  EGFR  (QD  605)  signal; 
(iii)  unmixed  P-catenin  (QD  655)  signal;  D.  Quantified  signal  comparison  between  these 
two  methods. 

Fig.  4  Comparison  of  QD  fluorescence  intensity  in  single  QD  or  cocktail  PBS 
solutions.  Fluorescence  intensity  of  each  of  the  three  QDs  was  detected  by 
QuantaMaster™  UV  VIS,  (Photon  Technology  International,  Birmingham,  NJ). 
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ABSTRACT 

The  interaction  between  the  chemokine  CXCL12  and  the  two  receptors,  CXCR4  and 
CXCR7,  is  thought  to  play  a  role  in  tumor  growth  and  metastasis  in  human  cancers. 
However,  the  expression  of  CXCL12,  CXCR4  and  CXCR7  and  their  role  in  lung  cancer 
is  not  fully  elucidated.  Here  we  examined  the  expression  of  CXCL12,  CXCR4  and 
CXCR7  in  23  small  cell  lung  cancer  (SCLC)  cell  lines  and  32  non-small  cell  lung 
cancer  (NSCLC)  cell  lines.  CXCL12,  CXCR4  and  CXCR7  were  overexpressed  in 
lung  cancer  cell  lines  compared  with  human  non-malignant  lung  epithelial  cells  (N=6). 
CXCR4  levels  were  significantly  higher  in  SCLCs  than  those  in  NSCLCs,  while  there 
were  no  differences  in  the  levels  of  both  CXCL12  and  CXCR7  between  SCLCs  and 
NSCLCs.  Frequencies  of  CXCL12,  CXCR4  and  CXCR7  overexpression  were  45%, 
80%  and  16%,  respectively,  and  CXCL12  expression  was  positively  associated  with 
expression  of  CXCR4  and  CXCR7.  RNA  interference-mediated  CXCL12  knockdown 
inhibited  cell  growth  and  migration  in  a  CXCL  12-overexpressing  lung  cancer  cells,  and 
the  effect  involved  inactivation  of  the  MEK-ERK  pathway.  Furthermore,  treatment 
with  an  anti-CXCL12  neutralizing  antibody  inhibited  cell  growth  in  four  of 
CXCL1 2-overexpressing  lung  cancer  cell  lines  but  not  in  CXCL12  non-expressing  lines. 
The  results  demonstrate  that:  CXCL12,  CXCR4  and  CXCR7  are  concomitantly 
overexpressed  in  lung  cancers;  CXCR4  is  abundantly  expressed  in  SCLCs  compared 
with  NSCLCs;  and  that  CXCL12  is  required  for  lung  cancer  cell  growth  and  migration 
via  the  MEK-ERK  signaling  pathway.  Thus,  inhibition  of  CXCL12  activity  could  be  a 
novel  therapeutic  approach  in  lung  cancer. 
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INTRODUCTION 

Lung  cancer  is  the  leading  cause  of  cancer-related  death  in  U.S  (Jemal,  et  al.  2007)  and 
worldwide.  Lung  cancer  is  divided  into  two  major  histological  types:  small  cell  lung 
cancer  (SCLC)  and  non-small  cell  lung  cancer  (NSCLC).  Despite  improvements  in 
therapy,  most  patients  with  lung  cancer  will  die,  in  most  cases  from  metastastic  disease 
(Minna,  et  al.  2002).  Accordingly,  there  is  a  major  need  for  identification  of  novel 
therapeutic  targets  and  elucidating  mechanisms  of  growth  and  metastasis  in  lung  cancer. 

Chemokines,  structurally  related,  small  (8-14kDa)  polypeptide  signaling 
molecules,  bind  to  and  activate  seven- transmembrane  G-protein-coupled  chemokine 
receptors  (Murphy  1996).  Chemokines  are  expressed  by  many  tumor  types  and  are 
implicated  in  tumor  cell  growth,  invasion,  and  metastasis  (Balkwill  2004).  Chemokine 
(C-X-C  motif)  ligand  12  (CXCL12)/stromal  cell-derived  factor  1  (SDF-1),  a  lOkDa 
secreted  protein,  is  a  homeostatic  chemokine  that  signals  through  chemokine  (C-X-C 
motif)  receptor  4  (CXCR4),  a  G  protein-coupled  receptor,  which  in  turn  plays  a  role  in 
hematopoiesis,  development,  and  organization  of  the  immune  system  (Kryczek,  et  al. 
2007).  The  interaction  between  CXCL12  and  CXCR4  is  implicated  in  cell 
proliferation,  migration,  adhesion,  angiogenesis,  and  metastasis  in  many  cancers 
including  breast,  lung,  ovary,  pancreas,  prostate,  neuroblastoma,  hepatic  cell  carcinomas 
(Geminder,  et  al.  2001;  Kryczek,  et  al.  2005;  Mochizuki,  et  al.  2004;  Mori,  et  al.  2004; 
Muller,  et  al.  2001;  Phillips,  et  al.  2003;  Sutton,  et  al.  2007;  Tang,  et  al.  2007). 
Recently,  CXCR7/RDC1  was  identified  as  a  second  receptor  for  CXCL12  (Balabanian, 
et  al.  2005)  and  may  function  in  regulating  growth  of  breast,  lung  and  prostate  cancers 
(Miao,  et  al.  2007;  Wang,  et  al.  2008).  However,  of  how  CXCL12  with  its  receptors, 
CXCR4  and  CXCR7,  play  a  role  in  the  development  of  lung  cancer  is  unknown. 
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In  the  present  study,  we  found  that  CXCL12,  CXCR4  and  CXCR7  were 
overexpressed  in  lung  cancer  cell  lines  compared  to  non-malignant  lung  epithelial  cells. 
CXCL12  overexpression  was  positively  associated  with  the  overexpression  of  its 
receptors  CXCR4  and  CXCR7,  suggesting  the  concomitant  expression  of  CXCL12  and 
the  receptors  in  an  autocrine  manner.  And  RNA  interference  (RNAi)-mediated 
knockdown  of  CXCL12  expression  in  over-expressing  lung  cancer  cells  led  to  the 
inhibition  of  cell  proliferation,  colony  formation  and  migration  as  well  as  the 
dephosphorylation  of  MEK  and  ERK.  Furthermore,  blocking  CXCL12  activity  with 
the  CXCL12  neutralizing  antibody  could  inhibit  cell  growth  in  four 
CXCL1 2-overexpressing  lung  cancers.  These  results  suggest  that  CXCL12  could  play 
a  major  role  in  the  biologic  behavior  and  be  a  new  therapeutic  target  for  lung  cancer. 
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MATERIALS  AND  METHODS 

Cell  lines.  We  used  23  SCLC  cell  lines  and  32  NSCLC  cell  lines,  all  of  which  were 
obtained  from  the  Hamon  Center  collection  (University  of  Texas  Southwestern  Medical 
Center)(Phelps,  et  al.  1996).  Normal  human  bronchial  epithelial  cells  (NHBE), 
small-airway  epithelial  (SAEC)  cells  and  immortalized  human  bronchial  epithelial  cells 
(BEAS-2B,  HBEC1,  HBEC3  and  HBEC4)  were  used  as  non-tumor  lung  controls. 
NHBE  and  SAEC  were  obtained  from  Clonetics  (San  Diego,  CA),  and  BEAS-2B  was 
obtained  from  ATCC.  HBEC1,  HBEC3,  and  HBEC4  cells  were  recently  generated  by 
the  authors  (Ramirez,  et  al.  2004).  Cancer  cells  were  cultured  with  RPMI  1640  with 
5%  fetal  bovine  serum,  and  human  bronchial  epithelial  cells  were  cultured  with 
Keratinocyte-SFM  (Invitrogen,  Carlsbad,  CA)  medium  containing  25  pg/mL  bovine 
pituitary  extract  (Invitrogen)  and  5  ng/mL  epidermal  growth  factor  (Invitrogen). 

Quantitative  real-time  RT-PCR.  The  expression  of  the  CXCL12,  CXCR4  and 
CXCR7  genes  was  examined  by  quantitative  real-time  RT-PCR  as  previously  described 
(Suzuki,  et  al.  2004).  Briefly,  total  RNA  was  extracted  using  the  RNeasy  mini  kit 
(Qiagen,  Valencia,  CA),  and  cDNA  was  synthesized  using  2  pg  of  total  RNA  with  the 
Superscript  II  First-Strand  Synthesis  using  oligo  (dT)  primer  system  (Invitrogen) 
according  to  the  manufacturer’s  instructions.  Primers  and  probes  for  CXCL12,  CXCR4 
and  CXCR7  were  purchased  from  Applied  Biosystems  (Tokyo,  Japan).  For  the 
quantitative  analysis,  the  TBP  gene  was  used  as  an  internal  reference  gene  to  normalize 
input  cDNA.  PCR  was  performed  in  a  reaction  volume  of  20  pi,  including  2  pi  cDNA 
using  the  Gene  Amp  7700  Sequence  Detection  System  and  software  (Applied 
Biosystems).  The  comparative  Ct  method  was  used  to  compute  relative  expression 
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values.  The  cutoff  value  of  mean  plus  3  SD  of  non-tumor  lung  cell  lines  (N=6)  was 
used  to  define  overexpression. 

Preparation  and  transfection  of  synthetic  small  interfering  RNA  (siRNA).  Two 

siRNAs  were  used  that  target  different  sites  of  CXCL12  mRNA,  which  were  purchased 
from  Dharmacon  Inc  (Lafayette,  CO).  A  siRNA  against  Tax  (the  human  leukemia 
virus  gene)  was  used  as  a  negative  control  (Sunaga,  et  al.  2004).  siRNAs  were 
transfected  into  cells  by  using  Oligofectamine  transfection  reagent  (Invitrogen)  as  a 
previously  described  method  (Sunaga,  et  al.  2004)  and  after  72  h,  cells  were  harvested 
for  further  analysis. 

MTT  assay.  Cell  viability  was  measured  using  the  3-(4,5 

dimethylthiazol-2yl)-2,5-diphenyl-tetrazolium  bromide  (MTT)  Cell  Growth  Assay  Kit 
(Chemicon  International,  Temecula,  CA)  according  to  the  manufacturer’s  protocol. 
Twenty-four  hr  after  transfection  with  siRNAs,  trypan  blue-negative  viable  cells  were 
re-plated  and  cultured  in  96-well  plates  in  replicates  of  8.  After  72  h,  cells  were  then 
incubated  with  0.5  mg/ml  MTT  for  4  hr  at  37°C.  After  MTT  withdrawal,  the  resulting 
blue  formazan  cristae  were  solubilized,  and  absorbance  was  read  at  570/630nm  using  a 
microtiter  plate  reader.  As  for  the  CXCL12  neutralizing  assay,  cells  were  treated  with 
3  pg/ml  of  the  anti-human  CXCL12/SDF-1  antibody  (R&D  Systems,  Minneapolis,  MN) 
or  3  pg/ml  of  the  IgGl  isotype  control  antibody  (R&D  Systems)  and  MTT  assay  was 
performed  after  72  h. 

Colony  formation  assay.  The  in  vitro  growth  characteristics  were  tested  by  a  colony 
formation  assay  (Sunaga,  et  al.  2004).  Briefly,  after  48  h  of  siRNA  transfection,  cells 
were  harvested,  and  500  of  trypan  blue-negative  viable  cells  were  re-plated  in  each  well 


6 

John  Wiley  &  Sons 


1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

21 

22 

23 

24 

25 

26 

27 

28 

29 

30 

31 

32 

33 

34 

35 

36 

37 

38 

39 

40 

41 

42 

43 

44 

45 

46 

47 

48 

49 

50 

51 

52 

53 

54 

55 

56 

57 

58 

59 

60 


Genes,  Chromosomes  &  Cancer 


of  6-well  plates.  The  cells  were  cultured  in  RPMI 1640  supplemented  with  5%  serum, 
and  surviving  colonies  were  counted  14  days  later  after  staining  with  methylene  blue. 

Migration  assay.  Cell  migration  was  measured  in  a  modified  Boyden  chamber  as 
previously  described  (Yokomizo,  et  al.  1997).  Briefly,  polycarbonate  filters  with  8  pm 
pores  (Neuroprobe,  Gaithersburg,  MD)  were  coated  with  100  pg/ml  of  collagen  (Elastin 
Products  Company,  Owensville,  MO)  in  0.5  M  acetic  acid  for  16  h.  The  coated  filter 
was  then  placed  on  a  12-blind- well  chemotaxis  chamber  (Neuroprobe)  containing  cells 
(105  cells  in  100  pi  per  well)  were  loaded  into  the  upper  wells.  The  cells  were 
incubated  for  15  min  before  being  loaded.  After  incubation  at  37°C  in  5%  CO2  for  4  h, 
the  filter  was  disassembled.  The  upper  side  of  the  filter  was  then  scraped  free  of  cells. 
The  cells  on  the  lower  side  of  the  filter  were  fixed  with  methanol  and  stained  with  a 
Diff-Quick  staining  kit  (International  Reagent,  Kobe,  Japan).  The  number  of  cells  that 
migrated  to  the  lower  side  of  the  filter  was  counted. 

Western  Blot  Analysis.  The  cells  were  grown  to  80  to  90%  confluency  and  harvested, 
and  cellular  proteins  were  extracted  with  lysis  buffer  (40  mM  HEPES-NaOH  [pH  7.4], 
1%  NP40,  0.5%  sodium  deoxycholate,  0.1%  SDS,  150  mMNaCl)  containing  Complete 
Mini,  a  cocktail  of  protease  inhibitors  (Roche,  Indianapolis,  IN).  Total  protein  was 
separated  on  a  SDS-polyacrylamide  gel  and  electroblotted  to  nitrocellulose  membranes 
(BIORAD,  Hercules,  CA).  After  blocking  with  5%  bovine  serum  albumin  and  0.1% 
Tween  20  in  Tris-buffered  saline,  membranes  were  incubated  at  room  temperature  for  3 
h  with  rabbit  polyclonal  anti-phospho-MEK  (mitogen-activated  protein 
kinase/extracellular  signal-related  kinase)  1/2  (Ser2 17/221)  (Cell  Signaling,  Beverly, 
MA),  rabbit  polyclonal  anti-phospho-p44/42  MAP  Kinase  (ERK;  Thr202/Tyr204;  Cell 
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Signaling),  rabbit  polyclonal  anti-  MEKl/2(Ser217/221)  (Cell  Signaling),  rabbit 
polyclonal  anti-  p44/42  MAP  Kinase  (p-ERK;  Thr202/Tyr204)  (Cell  Signaling),  rabbit 
polyclonal  anti-Akt  (Cell  Signaling)  and  rabbit  polyclonal  anti-phospho-Akt  (Ser473; 
Cell  Signaling),  antibodies.  The  membranes  then  were  developed  with  horseradish 
peroxidase  linked  whole  antibody  (GE  healthcare,  UK)  by  Super  Signal 
chemiluminescence  substrate  (Pierce,  Rockford,  IL). 

Statistical  Analysis.  For  comparison  of  gene  expression  levels,  unpaired  t  test  with 
Welch's  correction  was  used  between  two  groups,  and  Kruskal- Wallis  test  "  with 
Dunn's  Multiple  Comparison  test  was  used  between  three  groups.  Fisher’s  exact  test 
was  used  to  compare  frequencies.  One-way  ANOVA  with  Bonferroni’s  posthoc  test 
was  used  for  comparison  between  groups  in  MTT  assay,  colony  formation  assay,  and 
migration  assay.  All  statistical  analyses  were  performed  using  the  GraphPad  Prism 
version  5.0  software  program  for  Windows  (GraphPad  Software,  San  Diego,  CA).  P 
values<0.05  were  considered  to  be  statistically  significant. 
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RESULTS 

Overexpression  of  CXCL12,  CXCR4  and  CXCR7  in  lung  cancer  cell  lines. 

We  first  examined  the  expression  of  CXCL12,  CXCR4  and  CXCR7  mRNA  in  55  lung 
cancer  cell  lines  (32  NSCLCs  and  23  SCLCs)  by  the  quantitative  real-time  RT-PCR 
analysis.  When  the  expression  levels  in  lung  cancers  were  normalized  to  the  mean 
value  obtained  from  six  different  non-tumor  lung  cell  lines,  the  mean  levels  of  CXCL12, 
CXCR4  and  CXCR7  in  lung  cancers  were  41,  1277  and  2.7,  respectively.  The 
expression  levels  of  CXCL12  (Fig.  1A)  and  CXCR4  (Fig.  IB)  were  significantly  higher 
in  lung  cancer  cells  than  the  non-tumor  cells  (P<0.01  for  CXCL12,  and  PcO.OOOl  for 
CXCR4).  CXCR7  levels  were  also  high  in  lung  cancer  cells  but  the  difference  was  of 
borderline  significance  compared  with  the  levels  of  the  non-tumor  cells  (F=0.058;  Fig. 
1C).  CXCR4  expression  levels  were  significantly  higher  in  SCLCs  than  in  NSCLCs 
(Fig.  IE;  P<0.001),  while  there  was  no  significant  difference  in  the  levels  of  CXCL12 
(Fig.  ID)  or  CXCR7  (Fig.  IF)  between  SCLCs  and  NSCLCs.  The  results  demonstrate 
that  CXCL12,  CXCR4  and  CXCR7  are  overexpressed  in  lung  cancers  and  CXCR4 
expression  is  relatively  abundant,  especially  in  SCLC. 

Positive  association  of  the  expression  status  between  CXCL12  and  its  receptors 
CXCR4  and  CXCR7 

If  overexpression  was  defined  as  more  than  the  mean  level  plus  3  SD  in  the  non-tumor 
lung  cells,  frequencies  of  CXCL12,  CXCR4  and  CXCR7  overexpression  in  lung  cancer 
cell  lines  were  45%,  80%  and  16%,  respectively  (Table  1).  SCLCs  overexpressed  all 
of  these  genes  with  higher  frequencies  compared  with  NSCLCs,  although  no  significant 
difference  was  observed.  CXCL12  overexpression  was  positively  associated  with  the 
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overexpression  of  CXCR4  (P=0.008)  and  CXCR7  (P=0.008),  and  the  presence  of 
overexpression  of  either  CXCR4  or  CXCR7  (P=0.015;  Table  2),  indicating  the 
concomitant  expression  of  CXCL12  and  its  receptors,  CXCR4  and  CXCR7. 

Effect  of  RNAi-mediated  CXCL12  knockdown  on  cell  growth  and  migration  in 
lung  cancer  cells. 

Since  the  biological  significance  of  CXCL12  overexpression  in  lung  cancer  is  obscure, 
we  assessed  the  effect  of  CXCL12  gene  silencing  on  the  cell  growth  in 
CXCL1 2-overexpressing  lung  cancer  cell  lines  by  using  RNA  interference  (RNAi) 
technology.  Two  siRNAs  against  different  sites  of  CXCL12  mRNA  were  used  to 
verify  that  the  effect  of  CXCL12  siRNA  is  specific.  A  siRNA  against  the  Tax  gene  was 
used  as  a  negative  control  (Sunaga,  et  al.  2004).  siRNAs  against  CXCL12  and  Tax 
were  transfected  into  A549  cells  that  overexpress  CXCL12,  and  the  effect  on  gene 
silencing  was  monitored  by  real-time  RT-PCR.  Two  siRNAs  against  CXCL12  led  to  a 
marked  reduction  of  CXCL12  mRNA  expression  at  72  h  post-transfection  in  comparison 
to  the  levels  in  untreated  cells  and  they  both  gave  similar  results  (PcO.Ol;  Fig.  2),  while 
the  treatment  with  Oligofectamine  or  Tax  siRNA  did  not  significantly  affect  the 
expression  levels.  Thus,  RNAi-mediated  knockdown  of  CXCL12  expression 
successfully  reduced  the  CXCL12  mRNA  level  in  A549  cells. 

The  effect  of  CXCL12  knockdown  on  cell  proliferation  of  A549  cells  was 
examined  by  an  MTT  assay.  Knockdown  of  CXCL12  expression  led  to  significant 
inhibition  of  cell  proliferation  in  A549  cells  (P<0.0001;  Fig.  3)  while  the  treatment  with 
Oligofectamine  or  Tax  siRNA  did  not  affect  the  cell  proliferation.  In  H1299  NSCLC 
cells  that  lack  CXCL12  expression,  the  treatment  with  CXCL12  siRNAs  as  well  as  the 
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treatment  with  Oligofectamine  or  Tax  siRNA  did  not  affect  cell  proliferation  (Fig.  3). 
We  further  employed  the  colony  formation  assay  to  assess  the  effect  of  CXCL12 
siRNAs  on  cell  growth  in  A549  cells.  RNAi-mediated  CXCL12  knockdown 
significantly  inhibited  colony  formation  in  A549  cells  (PcO.Ol;  Fig.  4A,  B)  but  not  in 
H1299  cells  (Fig.  4B).  These  results  indicate  that  CXCL12  is  required  for  in  vitro  cell 
growth  in  lung  cancer  cells  that  overexpress  CXCL12. 

We  next  examined  the  effect  of  RNAi-mediated  CXCL12  knockdown  on  cell 
migration  in  A549  cells.  CXCL12  knockdown  mediated  resulted  in  a  significant 
attenuation  in  cell  migration  (PcO.OOOl;  Fig.  5)  while  the  treatment  with 
Oligofectamine  or  Tax  siRNA  did  not  inhibit  the  migration.  The  results  indicate  that 
CXCL12  plays  a  role  in  migration  of  CXCL12- overexpressing  lung  cancer  cells. 

Growth  inhibition  of  CXCL12-overexpressing  lung  cancer  cells  by  the 
anti-CXCL12  neutralizing  antibody. 

We  further  examined  the  effect  of  an  anti-CXCL12  neutralizing  antibody  on  cell 
proliferation  in  four  of  CXCL1 2-overexpressing  lung  cancer  cell  lines  A549,  HCC95, 
HI 264  and  H661.  The  CXCL12  levels  of  A549,  HCC95,  H1264  and  H661  were  317, 
557,  210  and  157,  respectively,  when  the  expression  levels  were  normalized  by  the 
mean  level  of  the  six  non-tumor  lung  cell  lines.  Viable  cells  were  significantly 
reduced  to  32%  in  A549,  47%  in  HCC95,  49%  in  H1264  and  32%  in  H661  by  the 
treatment  with  the  CXCL12  neutralizing  antibody  at  a  concentration  of  3  pg/ml  (Fig.  6; 
P<0.001  in  all  lines)  but  not  by  the  treatment  with  the  control  antibody.  On  the  other 
hand,  the  neutralization  of  CXCL12  did  not  affect  cell  proliferation  in  lung  cancer  cell 
lines  HI 299  and  H2009,  in  which  CXCL12  expression  levels  were  undetectable  (Fig.  6). 
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These  results  demonstrate  that  blocking  CXCL12  activity  led  to  inhibition  of  cell 
proliferation  in  CXCL1 2-overexpressing  lung  cancer  cells  but  not  in  lung  cancer  cells 
lacking  CXCL12  expression. 

RNAi-mediated  Knockdown  of  CXCL12  expression  led  to  the  inactivation  of 
MEK-ERK  pathways. 

In  order  to  elucidate  mechanisms  as  to  how  CXCL12  overexpression  regulated  signal 
transduction  in  lung  cancer  cells,  the  effect  of  RNAi-mediated  knockdown  of  CXCL12 
on  phosphorylation  of  MEK,  ERK  and  Akt  was  examined.  In  A549  cells, 
RNAi-mediated  CXCL12  knockdown  reduced  the  levels  of  phosphorylated  MEK  and 
phosphorylated  ERK  (Fig.  7),  while  CXCL12  knockdown  did  not  affect  the  levels  of 
phosphorylated  Akt  (Fig.  7).  The  results  suggest  that  cell  growth  in 

CXCL1 2-overexpressing  lung  cancer  cells  involved  in  activation  of  the  MEK-ERK 
pathways. 
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DISCUSSION 

Several  lines  of  evidence  have  indicated  that  CXCL12-CXCR4  interactions  plays  a  role 
in  tumor  growth  and  metastasis  in  lung  cancer.  In  previous  studies,  CXCR4  was 
shown  to  be  abundantly  expressed  in  SCLC  and  CXCL12-induced  activation  of  CXCR4 
enhanced  cell  invasion  and  adhesion  through  integrin  activation  (Burger,  et  al.  2003; 
Hartmann,  et  al.  2005;  Kijima,  et  al.  2002).  Recently,  elevated  CXCR4  expression  was 
observed  in  NSCLC  cell  lines  and  NSCLC  tumors,  and  CXCR4  expression  was 
implicated  in  the  metastatic  potential  of  NSCLC  (Oonakahara,  et  al.  2004;  Phillips,  et  al. 
2003;  Su,  et  al.  2005).  Consistent  with  these  findings,  we  observed  CXCR4 
overexpression  in  lung  cancer  cell  lines.  Considering  that  most  of  lung  cancer  cell 
lines  we  used  here  were  established  from  advanced  tumors  with  metastasis  and  came 
from  the  metastatic  lesions  (Phelps,  et  al.  1996),  it  is  likely  that  CXCR4  overexpression 
is  associated  with  high  metastatic  potential  in  lung  cancer.  In  addition,  we  found  that 
CXCR4  expression  was  relatively  more  abundant  in  SCLC  cells  versus  NSCLC  cells. 

In  contrast  to  the  evidence  available  for  CXCR4,  there  are  few  studies 
assessing  the  expression  of  CXCL12  and  the  recently  identified  receptor  CXCR7 
(Balabanian,  et  al.  2005)  in  lung  cancers.  In  agreement  with  a  previous  study  showing 
that  CXCL12  was  expressed  in  the  majority  of  NSCLC  tumors  (Wald,  et  al.  2006),  we 
confirmed  the  CXCL12  overexpression  in  lung  cancer  cell  lines.  On  the  other  hand, 
we  also  found  that  15  (47%)  of  NSCLC  lines  and  8  (35%)  of  SCLC  lines  exhibited 
lower  or  undetectable  CXCL12  expression  compared  with  non-tumor  lung  cells.  A 
recent  study  by  Suzuki  et  al.  reported  that  CXCL12  was  silenced  by  aberrant 
methylation  correlated  with  poor  prognosis  in  NSCLCs  (Suzuki,  et  al.  2008).  Of  note, 
they  also  demonstrated  that  positive  expression  of  CXCL12  was  correlated  with  lymph 
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node  involvement,  advanced  stage,  and  poor  prognosis  in  NSCLC  tumors.  Therefore, 
it  is  likely  that  CXCL12  may  have  opposite  functions  depending  on  the  cellular  context 
as  are  other  proteins  (e.g.  RAS  (Crespo  and  Leon  2000)  and  CAV1  (Sunaga,  et  al. 
2004))  shown  to  have  such  opposite  functions.  This  may  explain  why  RNAi-mediated 
CXCL12  knockdown  or  the  treatment  with  a  CXCL12  neutralizing  antibody  did  not 
affect  cell  growth  in  CXCL  12-nonexpressing  NSCLC  lines  in  the  current  study. 
Further  studies  with  some  functional  assays  should  be  needed  to  elucidate  whether 
CXCL12  can  act  as  a  tumor  suppressor  in  lung  cancers,  in  which  CXCL12  is  silenced 
by  the  promoter  hypermethylation. 

As  for  CXCR7,  recent  studies  have  reported  that  CXCR7  is  abundantly 
expressed  in  various  types  of  human  cancer  cell  lines  including  one  NSCLC  cell  line 
A549  (Burns,  et  al.  2006)  and  contributes  to  tumor  development  and  progression  in  lung 
and  breast  cancers  (Miao,  et  al.  2007).  The  current  results  of  CXCR7  overexpression 
in  lung  cancer  cells  support  their  findings  and  suggest  that  CXCR7  as  well  as  CXCR4 
implicates  in  lung  cancer  development  by  interacting  with  CXCL  12. 

Chemokines  display  autocrine  and  paracrine  roles  related  to  growth  and 
metastasis  of  human  cancers  including  lung  cancer  (Strieter,  et  al.  2004).  The 
autocrine  CXCL12-CXCR4  system  has  been  shown  to  be  involved  in  tumor 
development  (Kryczek,  et  al.  2007;  Raman,  et  al.  2007;  Uchida,  et  al.  2007).  In  this 
study,  we  found  the  concomitant  expression  of  CXCL  12  and  its  receptors  CXCR4  and 
CXCR7  in  lung  cancer  cells,  suggesting  the  existence  of  autocrine 
CXCL12-CXCR4/CXCR7  system  in  lung  cancer  cells  and  that  CXCL12  could  be  an 
autocrine  growth  factor. 

Reserchers  have  demonstrated  the  interaction  with  CXCL12  to  the  receptors 
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has  a  proliferative  effect  on  various  types  of  cancer  cells  (Darash-Yahana,  et  al.  2004; 
Katayama,  et  al.  2005;  Marchesi,  et  al.  2004;  Scotton,  et  al.  2002;  Sutton,  et  al.  2007) 
including  SCLC  cells  (Kijima,  et  al.  2002)  and  plays  an  essential  role  in  tumor  invasion 
and  metastasis  (Balkwill  2004)  as  previously  described  that  migration  of  lung  cancer 
cells  was  enhanced  in  response  to  CXCL12  (Phillips,  et  al.  2005)  or  by  forced 
expression  of  CXCR4  (Su,  et  al.  2005).  Here  we  used  RNAi  methodology,  a  different 
approach  from  these  previous  studies,  in  order  to  elucidate  the  function  of  endogenous 
CXCL12  in  lung  cancer  cells.  Our  findings  that  RNAi-mediated  CXCL12  knockdown 
inhibited  cell  proliferation,  colony  formation  and  migration  in  CXCL1 2-overexpressing 
lung  cancer  cells  suggest  that  CXCL12,  which  is  expressed  and  secreted  from  cancer 
cells,  can  act  as  an  activator  of  cell  proliferation  and  migration  in  lung  cancer.  The 
results  of  growth  inhibitory  effect  by  CXCL12  knockdown  prompted  us  to  examine 
whether  a  CXCL12  neutralizing  antibody  could  inhibit  cell  growth  in 
CXCL1 2-overexpressing  lung  cancer  cells.  Our  observation  of  a  marked  decrease  in 
cell  viability  by  the  anti-CXCL12  antibody  indicates  that  the  interaction  of 
CXCL12-CXCR4/CXCR7  might  be  critical  for  cell  growth  and  inhibition  of  CXCL12 
has  therapeutic  potential  for  lung  cancers,  in  which  the  interaction  is  active.  Further 
studies  in  vivo  will  be  needed  to  elucidate  the  therapeutic  effectiveness  of  anti-CXCL12 
antibody  for  lung  cancer. 

It  has  been  reported  that  CXCL12  has  multiple  functions  via  regulation  of 
MAPK  and  PI3K-Akt  signaling  pathways  (Kryczek,  et  al.  2007;  Liang,  et  al.  2007; 
Wang,  et  al.  2000).  In  the  present  study,  RNAi-mediated  CXCL12  knockdown 
decreased  phosphorylation  of  MEK  and  ERK  in  lung  cancer  cells.  In  agreement  with 
the  previous  studies  (Arai,  et  al.  2006;  Huang,  et  al.  2007;  Mori,  et  al.  2004;  Tang,  et  al. 
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2007),  the  current  results  suggest  that  CXCL12-CXCR4  interaction  can  activate 
MEK-ERK  signaling  pathway,  which  in  turn  promotes  cell  proliferation  and  migration 
(Huang,  et  al.  2004;  Shaul  and  Seger  2007).  The  effect  of  CXCL12  knockdown  on 
Akt  phosphorylation  was  also  examined  since  previous  studies  indicated  the  ability  of 
CXCL12  to  induce  activation  of  Akt  (Wang,  et  al.  2000).  However,  Akt 
phosphorylation  was  not  affected  by  knockdown  of  CXCL12  expression;  therefore  the 
PI3K-Akt  pathway  seems  to  be  irrelevant  to  the  CXCL12-CXCR4/CXCR7  interaction 
at  least  in  lung  cancer  cells. 

In  conclusion,  our  expression  analysis  for  CXCL12,  CXCR4  and  CXCR7  with 
a  large  number  of  lung  cancer  cell  lines  demonstrates  that  these  genes  are  concomitantly 
overexpressed  in  lung  cancer  cells.  RNAi-mediated  CXCL12  knockdown  led  to 
inhibition  of  cell  growth  and  migration  as  well  as  dephosphorylation  of  MEK  and  ERK 
in  CXCL  12-overexpressing  lung  cancer  cells,  suggesting  that 
CXCL12-CXCR4/CXCR7  interactions  play  a  role  in  the  development  of  lung  cancer 
through  the  activation  of  MEK  and  ERK  pathway.  The  growth-inhibitory  effect  of  the 
CXCL12  neutralizing  antibody  in  CXCL  12-overexpressing  lung  cancer  cells  provides 
the  possibility  that  inhibition  of  CXCL  12  activity  could  be  a  novel  therapeutic  approach 
in  lung  cancer.  Further  studies  will  be  needed  to  elucidate  whether  CXCL  12 
expression  could  be  a  biomarker  and  therapeutic  target  for  lung  cancer. 
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FIGURE  LEGENDS 

Figure  1.  Comparison  of  mRNA  levels  of  the  (A)  CXCL12,  (B)  CXCR4  and  (C)  CXCR7 
genes  relative  to  those  of  the  TBP  gene  between  lung  cancer  cell  lines  (N=55)  and 
nontumor  lung  cell  lines  (N=6)  as  measured  by  quantitative  real-time  RT-PCR  analysis. 
The  expressions  of  (D)  CXCL12,  (E)  CXCR4  and  (F)  CXCR7  were  also  compared 
between  SCLC  and  NSCLC  cell  lines.  The  expression  levels  were  normalized  by 
mean  level  of  the  non-tumor  cells.  Bars  indicate  the  means  of  the  relative  CXCL12, 
CXCR4  and  CXCR7  expression. 

Figure  2.  RNAi-mediated  knockdown  of  CXCL12  mRNA  expression.  NT:  treatment 
with  medium  alone;  Oligo:  treatment  with  Oligofectamine  reagent  alone.  A549  cells 
were  transfected  with  100  nM  siRNAs  against  either  CXCL12  (SDF1-1  and  SDF1-2)  or 
Tax.  After  72  h,  cells  were  harvested  and  quantitative  real-time  RT-PCR  was 

performed.  Columns  represent  the  mean  CXCL12  mRNA  levels  ±  SD  (bars) 

obtained  from  three  independent  experiments.  Treatment  with  medium  alone  was  set 
at  100%.  *,P<0.01. 

Figure  3.  RNAi-mediated  knockdown  of  CXCL12  expression  inhibited  cell 
proliferation  in  CXCL1 2-overexpressing  A549  cells  but  not  in  CXCL  12-nonexpressing 
HI 299  cells  as  measured  by  an  MTT  assay.  siRNAs  were  transfected  into  the  cells, 
and  MTT  assay  was  performed  in  replicates  of  8  at  4  days  after  transfection.  Columns 

represent  the  mean  ±  SD  (bars).  Treatment  with  medium  alone  was  set  at  100%.  *, 

P<0.0001. 


18 

John  Wiley  &  Sons 


1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

21 

22 

23 

24 

25 

26 

27 

28 

29 

30 

31 

32 

33 

34 

35 

36 

37 

38 

39 

40 

41 

42 

43 

44 

45 

46 

47 

48 

49 

50 

51 

52 

53 

54 

55 

56 

57 

58 

59 

60 


Genes,  Chromosomes  &  Cancer 


Figure  4.  CXCL12  siRNAs  inhibited  colony  formation  in  CXCL1 2-overexpressing 
A549  cells  but  not  in  CXCL1 2-nonexpressing  H1299  cells.  After  48  h  of  siRNA 
transfection,  cells  were  replated  for  colony  formation  assay  in  liquid  culture,  and  after 
14  days,  surviving  colonies  were  stained  with  methylene  blue.  (A)  Stained  colonies  of 
A549  cells  are  shown.  (B)  Columns  represent  the  mean  ±  SD  {bars)  obtained  from 
three  independent  experiments.  Treatment  with  medium  alone  was  set  at  100%.  *, 

P<0.01. 

Figure  5.  CXCL12  siRNA  inhibited  migration  of  CXCL  12-overexpressing  A549  cells 
as  measured  by  migration  assay.  After  48  h  of  siRNA  transfection,  105  cells/well  of 

12-blind- well  chemotaxis  chamber  were  loaded.  Columns  represent  the  mean  ±  SD 

{bars)  obtained  from  three  independent  experiments.  Treatment  with  medium  alone 
was  set  at  100%.  *,  PcO.0001. 

Figure  6.  The  CXCL  12  neutralizing  antibody  inhibited  cell  proliferation  in 
CXCL1 2-overexpressing  lung  cancer  cell  lines  (A549,  HCC95,  H1264  and  H661)  but 
not  in  CXCL1 2-nonexpressing  lung  cancer  cell  lines  (H1299  and  H2009).  Cells  were 
treated  with  the  anti-CXCL12  neutralizing  antibody  (3  gg/ml)  or  the  IgGl  control 
antibody  (3  pg/ml)  for  72  h  and  cell  proliferation  was  measured  by  MTT  assay. 

Columns  represent  the  mean  ±  SD  {bars)  obtained  from  four  independent  experiments. 
Nontreatment  was  set  at  100%.  *,  RcO.OOOl. 
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Figure  7.  RNAi-mediated  knockdown  of  CXCL12  reduced  on  the  levels  of 
phosphorylated  MEK  (p-MEK)  and  phosphorylated  ERK  (p-ERK)  but  not 
phosphorylated  Akt  (p-Akt).  Cells  were  harvested  at  72  h  post-transfection  of  siRNAs 
and  Western  blotting  was  performed.  Fifteen  pg  of  total  protein  were  loaded  in  each 
lane. 
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Table  1.  Frequencies  of  overexpression  of  CXCL12  and  its  receptors  CXCR4  and 
CXCR7  in  lung  cancer  cell  lines. 

Number  of  the  overexpressed  cell  lines  (%) 


Total  SCLC  NSCLC 


CXCL12 

25 

(45) 

14 

(61) 

11 

(34) 

CXCR4 

44 

(80) 

21 

(91) 

23 

(72) 

CXCR7 

9 

(16) 

5 

(22) 

4 

(13) 
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Table  2.  Correlation  between  the  expression  of  CXCL12  and  its  receptors  CXCR4  and  CXCR7  in  lung 
cancer  cell  lines. 


CXCR4(+) 

CXCR4(-) 

CXCR7(+) 

CXCR7(-) 

CXCR4(+)  or 

CXCR7(+) 

CXCR4(-)  and 

CXCR7(-) 

CXCL12  (+) 

24 

1 

8 

17 

24 

1 

CXCL12  (-) 

20 

10 

1 

29 
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Fig.  6 
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receptor  gene  signature  as  a  predictive  (rather  than  retrospective  correlative)  biomarker  for  lung 
cancer.  A  new  table  reporting  the  Cox  regression  multivariate  analysis  is  now  also  included  to  show 
that  in  addition  to  being  a  predictive  marker,  the  risk  of  death  that  correlates  with  the  NR  signature  is 
highly  significant  and  independent  of  any  clinical  variables.  We  believe  our  work  breaks  precedent 
with  other  biomarker  studies  by  making  the  following  important  contributions  to  the  field: 

1.  We  demonstrate  that  expression  of  the  nuclear  receptor  superfamily  in  both  the  tumor  and  normal 
tissue  of  patients  with  non- small  cell  lung  cancer  (NSCLC)  provides  a  novel,  robust  prognostic 
indicator  of  patient  survival  and  progression  of  disease. 

2.  We  provide  the  first  single  gene  signatures  for  both  survival  and  identification  of  early-stage,  high- 
risk  patients,  which  is  considered  the  most  clinically  meaningful  stage  for  predicting  outcome. 

3.  We  show  that  the  predictive  power  of  the  nuclear  receptor  signature  requires  only  the  mRNA 
expression  data  and  is  independent  of  any  other  clinical  features. 

4.  We  also  show  that  the  risk  of  death  associated  with  nuclear  receptor  expression  profiling  is 
independent  of  other  clinical  features. 

5.  To  our  knowledge,  this  is  the  first  prognostic  biomarker  set  for  lung  cancer  that  identifies  a  class  of 
validated  drug  targets  that  might  be  used  therapeutically  to  treat  individual  patients.  Since  nuclear 
receptors  are  well-studied  transcription  factors,  they  also  offer  a  testable  set  of  genes  that  may  lead  to  a 
deeper  understanding  of  the  genesis  and  progression  of  the  disease. 

6.  We  provide  a  Sweave  document  (a  literate  programming  package  that  permits  reproduction  of  high- 
throughput  data  analysis).  A  major  problem  with  most  gene  signature  studies  in  the  cancer  field  has 
been  the  inability  of  others  to  reproduce  the  work  (e.g.,  see  Nat.  Med.  13,  1276  [2007]).  Sweave 
overcomes  this  problem  and  to  our  knowledge,  our  manuscript  would  be  the  first  of  its  kind  in  the  field 
to  provide  such  a  document. 

We  suggest  the  following  as  potential  reviewers  for  this  work:  Charles  Sawyers  at  Memorial 
Sloan-Kettering  Cancer  Center  (sawyersc@mskcc.org).  Myles  Brown  at  Dana-Farber  Cancer 
Institute  (Myles  Brown@dfci.harvard.edu).  David  Carbone  at  Vanderbilt-Ingram  Cancer  Center 
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(d.carbone@vanderbilt.edu).  and  Bert  O’Malley  at  Baylor  College  of  Medicine 
(berto@bcm.tmc.edu). 

Thank  you  for  your  consideration  of  our  work  for  publication  in  Cancer  Cell. 
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David  J.  Mangelsdorf,  Ph.D. 
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Response  to  Reviewers 


A  response  to  reviewers  is  not  applicable  for  this  submission,  since  this  was  an  invited 
revision  by  the  editor  and  was  never  sent  out  for  peer  review.  Nevertheless,  below  I 
briefly  have  outlined  the  types  of  changes  we  made  in  consultation  with  the  editor,  Dr. 
Xiaohong  Helena  Yang. 

1.  We  re-wrote  large  sections  of  the  paper  (including  the  Summary  and  Significance)  to 
clarify  the  key  points  and  conclusions  that  set  this  work  apart  as  a  paradigm  shift  for 
biomarker  analysis.  We  also  emphasized  the  differences  between  the  predictive 
(prognostic)  gene  signature  analysis  we  did  versus  other  methods  (e.g.,  retrospective 
studies). 

2.  We  included  a  new  table  (Table  2)  that  provides  a  multivariate  Cox  regression  analysis 
to  show  that  the  NR  signature  is  independent  of  clinical  variables  (which  further  supports 
one  of  the  key  findings  of  the  paper). 

3.  We  removed  supplemental  figure  S4,  which  was  somewhat  confusing  and  redundant  to 
other  data  presented  in  the  paper. 
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SUMMARY 


Utilizing  quantitative  real-time  PCR  expression  data  from  30  microdissected  non¬ 
small  cell  lung  cancers  (NSCLCs)  and  their  pair-matched  normal  lung  epithelium, 
we  identified  the  nuclear  receptor  (NR)  superfamily  as  a  biomarker  that  predicts 
patient  survival  and  disease  progression  in  both  tissues.  The  NR  signature  from 
the  NSCLC  samples  was  validated  in  an  independent  microarray  dataset  from 
442  resected  lung  adenocarcinomas.  Remarkably,  the  prognostic  signature  in 
tumors  could  be  distilled  to  expression  of  progesterone  receptor  and  short 
heterodimer  partner  as  single  gene  predictors  of  survival  and  high-risk  stage  1 
disease,  respectively.  Identification  of  prognostic  NR  expression  patterns  in 
tumor  and  normal  lung  epithelium  from  individual  patients  not  only  provides 
validated  therapeutic  targets  but  also  may  reveal  the  pathways  involved  lung 
tumorigenesis. 

SIGNIFICANCE 

Despite  numerous  attempts,  little  progress  has  been  made  to  identify 
biomarkers  that  can  be  used  in  lung  cancer  patients  to  predict  outcome  and 
guide  therapy.  In  this  study,  we  analyzed  expression  of  the  NR  superfamily  to 
provide  a  unique  prognostic  signature  that  can  both  predict  patient  survival  and 
identify  early  stage  high-risk  patients.  Importantly,  NR  expression  in  either  the 
lung  tumor  or  the  adjacent  normal  tissue  of  patients  has  prognostic  power. 
Because  NRs  are  ligand-dependent  transcription  factors  and  targets  of  proven 
drugs,  receptors  identified  in  these  profiles  should  provide  promising  targets  for 
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mechanistic  studies  of  lung  cancer  oncogenesis  and  therapeutics  to  treat 
individual  patients.  This  study  highlights  the  potential  of  using  a  rationally 
designed  set  of  genes  as  theragnostic  biomarkers. 
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INTRODUCTION 


The  prevalence  of  lung  cancer  as  the  primary  cause  of  cancer  death  in  the  U.S. 
has  led  to  renewed  efforts  to  obtain  biomarker  signatures  that  provide  prognostic 
information  to  guide  therapy  for  individual  patients  (i.e.,  “personalized  medicine”) 
(Jemal  et  al.,  2008;  Sun  et  al.,  2007;  Xie  and  Minna,  2008).  A  key  strategy 
toward  this  goal  has  been  to  identify  tumor  biomarkers  using  gene  expression 
profiling,  combined  with  standard  clinical  data  (e.g.,  age,  gender,  smoking 
history,  histology  and  clinical  pathologic  stage)  (Shedden  et  al.,  2008;  Xie  and 
Minna,  2008).  Such  data  would  be  particularly  useful  for  choosing  the 
appropriate  therapeutic  options  for  early  stage  patients  (e.g.,  stage  I  non-small 
cell  lung  cancers,  [NSCLCs]),  where  surgical  resection  (with  or  without  adjuvant 
chemotherapy)  is  still  the  standard  treatment  (Minna  and  Schiller,  2008). 
However,  even  under  favorable  circumstances,  a  substantial  fraction  of  patients 
relapse  and  die  (Minna  and  Schiller,  2008).  These  statistics  have  led  to  multiple 
genome-wide  expression  studies  to  develop  signatures  that  also  predict  patient 
outcomes  (Chen  et  al.,  2007;  Potti  et  al.,  2006;  Shedden  et  al.,  2008).  While 
these  approaches  have  identified  potential  biomarkers,  nearly  all  of  the  gene 
signatures  have  been  different  and  they  have  not  provided  a  basis  for 
understanding  lung  cancer  pathogenesis  (Beer  et  al.,  2002;  Chen  et  al.,  2007; 
Endoh  et  al.,  2004;  Lu  et  al.,  2006;  Potti  et  al.,  2006;  Shedden  et  al.,  2008). 
Perhaps  more  importantly,  to  date  these  studies  have  failed  to  yield  new 
therapeutic  targets.  Clearly,  identification  of  biomarkers  that  also  provide 
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hypotheses  for  mechanism-based  studies  of  carcinogenesis,  and  offer  new 
therapeutic  targets,  would  be  of  tremendous  benefit. 

Nuclear  receptors  (NRs)  are  a  large  family  of  ligand-dependent 
transcription  factors  that  respond  to  a  number  of  hormonal  and  dietary-derived 
lipids,  including  endocrine  steroids,  fat-soluble  vitamins,  fatty  acids,  and 
cholesterol  metabolites  (Chawla  et  al.,  2001).  NRs  are  also  among  the  most 
successful  targets  of  drugs  approved  to  treat  numerous  diseases,  including 
cancer  (Gronemeyer  et  al.,  2004;  Shulman  and  Mangelsdorf,  2005).  Previously, 
we  have  shown  that  NR  expression  profiling  can  be  used  to  reveal  the 
mechanistic  basis  of  the  hierarchical  transcriptional  networks  that  govern  a 
number  of  physiological  processes,  including  development,  differentiation, 
reproduction,  circadian  rhythm,  and  metabolism  (Barish  et  al.,  2005;  Bookout  et 
al.,  2006;  Fu  et  al.,  2005;  Xie  et  al.,  2009;  Yang  et  al.,  2006).  In  the  present 
study,  we  investigated  the  potential  role  of  the  48  members  of  the  NR 
superfamily  as  theragnostic  indicators  in  lung  cancer.  Our  strategy  of  examining 
expression  of  NRs,  which  are  known  therapeutic  targets  with  defined 
mechanisms  of  action,  differs  from  previous,  open-ended  genome-wide 
microarray  studies  that  have  yet  to  yield  useful  clinical  targets.  Our  goal  was  to 
use  both  normal  and  tumor  NR  expression  signatures  as  clinical  tools  to  classify 
patients  with  different  survival  outcomes,  characterize  the  NR  transcriptional 
networks  that  govern  lung  cancer  pathobiology,  and  eventually  develop  NR- 
selective  therapies  targeted  at  hormonal  manipulation  of  lung  cancer.  Utilizing 
quantitative,  real-time  PCR  (QPCR),  we  evaluated  the  expression  of  the  NR 
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superfamily  in  normal  lung  and  pair-matched  NSCLC  tumor  lesions 
microdissected  from  30  individuals.  The  prediction  model  built  from  these  30 
resected  NSCLCs  was  then  validated  in  a  recent  NCI-sponsored,  multi- 
institutional,  genome-wide  microarray  dataset  taken  from  442  resected  lung 
adenocarcinomas  (Shedden  et  al.,  2008).  We  found  that  NR  expression  profiles 
can  both  predict  patient  survival  and  identify  early  stage  high-risk  patients.  Of 
particular  interest,  this  prediction  model  was  dependent  solely  on  the  expression 
signature  for  the  NR  superfamily  and  did  not  require  inclusion  of  clinical  features. 
Furthermore,  we  found  that  expression  of  progesterone  receptor  (PR,  NR3C3) 
and  short  heterodimer  partner  (SHP,  NR0B2)  are  the  principle  components  that 
describe  the  predictive  power  of  the  NR  signature,  and  thereby  represent  the  first 
single  gene  predictors  for  overall  patient  survival  and  high-risk,  early  stage 
disease,  respectively.  Finally,  we  provide  a  Sweave  document  (Coombes  et  al., 
2007;  Gentleman,  2005)  as  Supplemental  Data  online  that  contains  a  literate 
programming  package  to  permit  the  full  reproduction  of  our  analysis. 

RESULTS 

Identification  of  the  NR  superfamily  as  a  prognostic  biomarker  for  lung 
cancer 

QPCR  was  used  to  analyze  the  mRNA  expression  of  all  48  members  of  the  NR 
superfamily  in  a  cohort  of  30  NSCLC  tumors  and  their  pair-matched, 
histologically  normal  lung  epithelium  obtained  by  microdissection  from  the  MD 
Anderson  Cancer  Center  (MDACC).  Prior  studies  have  used  macrodissected 
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lung  tumor  samples  that  included  variable  fractions  of  tumor  cells  (ranging  from 
20-80%).  Our  analysis  of  microdissected  material  permitted  an  unprecedented, 
quantitative  comparison  of  NR  expression  in  tumor  cells  to  adjacent  normal  lung 
epithelium.  The  inclusion  of  normal  tissue  from  the  same  patient  also  provided  an 
internal  control  for  systemic  (e.g.,  hormonal)  and  local  (e.g.,  microenvironmental) 
factors,  and  it  allowed  us  to  investigate  whether  NR  expression  from  normal  lung 
epithelium  contained  prognostic  information.  Detailed  clinical  data  on  the  30 
patient  cohort  are  given  in  supplemental  data  (Tables  SI  and  S2),  and  the  QPCR 
datasets  of  NR  expression  are  shown  and  summarized  in  supplemental  data 
(Figure  SI  and  Table  S3)  (raw  datasets  are  available  at  www.NURSA.org). 
Inspection  of  these  data  showed  that  there  was  considerable  variation  in  NR 
expression  between  patients.  Therefore,  we  investigated  whether  any  prognostic 
association  existed  between  NR  expression  and  patient  clinical  features  including 
disease  progression  and  death.  Unsupervised  cluster  analysis  of  NR  expression 
in  lung  tumors  revealed  two  distinct  clusters  of  tissue  samples  (Figure  1A).  Note 
that  one  tissue  sample  (857-SCC)  did  not  fall  into  either  cluster  and  was  treated 
as  an  outlier.  To  our  surprise,  the  two  major  branches  of  the  dendrogram  (cluster 
1  and  cluster  2)  were  associated  with  both  overall  survival  rates  (P=0.001)  and 
disease  progression  rates  (P=0.062),  but  no  other  clinical  features  (Table  1).  In 
this  study,  disease  progression  was  defined  as  either  recurrence  of  lung  cancer 
or  patient  death.  Indeed,  Kaplan-Meier  plots  for  survival  and  disease  progression 
showed  that  cluster  1  and  cluster  2  segregated  patients  into  those  with  poor  and 
good  prognostic  outcomes,  respectively  (P=0. 000048  for  survival;  P=0.0018  for 
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disease  progression)  (Figure  IB  and  C).  These  findings  suggest  that  the  NR 
signature  defines  an  independent  prognostic  biomarker  for  survival  and  disease 
progression. 

Analysis  of  the  NR  expression  profiles  in  the  histologically  normal  lung 
epithelium  showed  that  the  patterns  in  the  normal  tissues  were  also  predictive  for 
both  survival  and  time  to  progression  (Figure  S2).  Although  18  out  of  48  NRs 
showed  a  statistically  significant  correlation  in  expression  between  normal  and 
tumor  tissue,  the  NRs  that  correlated  with  prognosis  were  different  when 
comparing  normal  lung  epithelium  to  tumor  cells  (see  below). 

It  is  of  interest  that  the  unsupervised  cluster  analysis  also  revealed  two 
major  clusters  of  NR  genes  that  exhibited  relatively  high  or  low  expression  in  the 
majority  of  the  tumor  samples  (Figure  1A),  suggesting  that  these  receptors  may 
be  of  mechanistic  importance  to  lung  cancer  pathology. 

Validation  of  the  NR  gene  signature  as  a  predictor  of  patient  survival 

To  validate  the  use  of  NR  expression  as  an  independent  prognostic  marker,  we 
used  the  NR  gene  signature  to  build  a  predictive  model  from  the  tumor  samples 
of  the  30  patient  cohort  by  using  recursive-partitioning  tree  analysis  (RPART); 
and  we  further  tested  the  prediction  performance  by  the  leave  one-out  cross- 
validation  (LOOCV)  method.  The  hazard  ratio  (HR),  i.e.,  risk  of  death,  for  the 
predicted  high-risk  vs.  the  low-risk  signatures  using  tumor  samples  was  7.03; 
95%  confidence  interval  [Cl],  2.22  to  22.3;  P=0.00015  (Figure  2A). 
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Because  the  majority  of  gene  expression  data  now  available  from  lung 
cancer  samples  comes  from  microarray  expression  studies,  we  investigated 
whether  the  NR  expression  profile  could  be  validated  within  one  of  these 
previously  acquired  datasets.  One  of  the  largest  independent  lung  cancer 
microarray  datasets  available  is  the  recently  published  NCI  Director’s  Consortium 
for  study  of  lung  cancer  involving  442  resected  NSCLCs  (Shedden  et  al.,  2008). 
From  that  study,  the  Affymetrix  U133A  microarray  data  for  the  48  NR  gene 
expression  signatures  were  excerpted  and  used  in  three  different  ways  to 
validate  the  prognostic  value  of  NR  expression.  We  first  validated  the  30-sample 
GPCR  dataset  on  the  442-sample  microarray  dataset;  and  then  we  developed  an 
NR  signature  from  the  microarray  dataset  and  validated  it  on  the  GPCR  data. 
Both  directions  of  training  and  testing  provided  statistically  significant  predictive 
power  for  patient  survival  (Figure  2B  and  C),  with  validation  of  the  GPCR  data 
within  the  microarray  data  being  the  most  significant  (Figure  2B).  The  higher 
significance  value  for  the  QPCR  dataset  likely  reflects  the  greater  dynamic  range 
and  quantitative  nature  of  the  QPCR  assay,  and  the  greater  homogeneity  of  the 
microdissected  samples.  Finally  we  divided  the  microarray  data  into  training  and 
testing  sets  for  validation  (Figure  2B-D).  For  this  analysis  the  442-sample 
dataset  was  divided  into  training  and  testing  sets,  and  analyzed  using  the 
predictive  RPART  model.  We  used  the  same  training  and  testing  strategy  as  in 
the  genome-wide  analyses  of  these  data  (Shedden  et  al.,  2008).  The  training  set 
(n=256)  included  samples  from  University  of  Michigan  Cancer  Center  (UM, 
n=17 7)  and  Moffitt  Cancer  Center  (FILM,  n=79),  and  the  testing  set  (n=186) 
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included  the  Memorial  Sloan-Kettering  Cancer  Center  (MSK,  n=104)  and  Dana- 
Farber  Cancer  Institute  (CAN/DF,  n=82)  samples.  Using  just  the  NR  expression 
profile  from  training  data  to  build  a  predictive  model  yielded  a  hazard  ratio  of  2.04 
(95%  Cl,  1.12  to  3.71;  P=0.018)  for  the  predicted  high-risk  vs.  the  predicted  low- 
risk  signature  in  testing  data  (Figure  2D).  Interestingly,  the  NR  signature  was  no 
longer  predictive  of  patient  survival  when  other  clinical  variables  were  included  in 
the  analysis  (Figure  S3).  This  latter  finding  suggests  that  all  of  the  predictive 
power  of  the  NR  signature  is  contained  within  the  expression  data  and  is 
independent  of  knowing  any  of  the  other  demographic  features.  Taken  together, 
these  results  strongly  support  the  utility  of  the  NR  gene  signature  as  prognostic 
marker,  even  when  applied  and  cross-validated  independently  by  two  different 
gene  expression  platforms  (QPCR  and  microarray). 

Although  the  clinical  variables  did  not  improve  the  predictive  power  of  the 
NR  signature,  it  was  of  interest  to  examine  whether  the  risk  of  death  (i.e. ,  hazard 
ratio  [HR])  that  is  associated  with  the  NR  signature  was  independent  of  the 
clinical  variables.  Therefore,  we  performed  a  retrospective  multivariate  Cox 
proportional-hazard  analysis  that  included  NR  signature,  gender,  age  at 
diagnosis,  use  of  adjuvant  chemotherapy,  use  of  adjuvant  radiation  therapy,  and 
stage  as  the  co-variables.  We  first  analyzed  the  Consortium  testing  dataset, 
which  included  samples  from  Memorial  Sloan-Kettering  Cancer  Center  and 
Dana-Farber  Cancer  Institute.  The  NR  signatures  that  were  used  in  this  analysis 
were  derived  from  the  prediction  model  built  from  the  Consortium  training  dataset 
(from  the  University  of  Michigan  Cancer  Center  and  Moffitt  Cancer  Center).  This 
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multivariate  analysis  revealed  that  the  association  between  NR  signatures  and 
survival  was  independent  of  other  clinical  variables  (HR=1 .98,  P= 0.037)  (Table  2, 
left-column).  Next,  we  analyzed  the  association  between  NR  signatures  and 
survival  in  the  combined  Consortium  training  and  testing  datasets  using  the  NR 
signatures  derived  from  the  prediction  model  built  from  the  MDACC  dataset. 
Again,  the  association  between  NR  signatures  and  survival  was  independent  of 
other  clinical  variables  (HR=1 .89,  P=0. 000099),  consistent  with  the  results  in 
Table  1 .  Thus,  the  data  in  Table  2  reveal  a  significant  correlation  exists  between 
a  patient’s  NR  profile  and  survival  when  adjusted  for  other  clinical  variables.  As 
expected,  the  correlation  between  tumor  stage  and  patient  survival  was  also 
highly  significant,  confirming  this  clinical  feature  as  a  well-recognized  prognostic 
marker  used  in  the  clinic.  It  is  interesting  to  note  that  gender  also  was 
significantly  correlated  to  patient  survival  in  our  analysis  (males  had  a  higher  risk 
than  females). 

Refinement  of  the  NR  signature  into  single  gene  predictors 

We  next  explored  the  roles  of  specific  NRs  in  the  prediction  models.  To  address 
this  question,  we  further  interrogated  the  RPART  model  (see  experimental 
procedure  for  details)  and  found  the  progesterone  receptor  (PR,  NR3C3)  and  the 
orphan  receptor,  short  heterodimeric  partner  (SHP,  NR0B2),  performed 
remarkably  well  as  single  gene  markers.  Surprisingly,  PR  expression  was 
identified  as  the  only  co-variable  left  in  the  final  RPART  prediction  model  built 
from  the  30-patient  MDACC  dataset.  PR  was  strongly  associated  with  patient 
survival  by  LOOCV  analysis  (P=0.00015)  (Figure  3A),  and  was  highly  predictive 
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(P=0.0048)  for  lung  cancer  patient  prognosis  when  independently  validated  in  the 
microarray  dataset  (Figure  3B).  Thus,  the  prediction  of  survival  by  PR  expression 
alone  was  identical  to  that  of  the  entire  48  NR  gene  set  (compare  Figure  3A  and 
3B  to  Figure  2A  and  2B).  The  analysis  also  revealed  that  increased  SHIP 
expression  was  a  novel  biomarker  of  a  good  prognosis  in  the  30-patient  LOOCV 
dataset  (FIR,  13.6;  95%  Cl,  3.01  to  61.4;  P=0.000019)  (Figure  3D),  and  this  result 
was  further  validated  in  the  testing  cohort  of  the  Consortium  dataset  (FIR,  1 .61 ; 
95%  Cl,  1.13  to  2.3;  P=0.0078)  (Figure  3E).  The  protective  effect  based  on  PR 
and  SFIP  expression  was  further  strengthened  by  univariate  Cox  regression 
modeling,  which  consistently  showed  that  expression  of  both  NRs  correlated  with 
significantly  low  hazard  ratios  in  the  microarray  dataset  (Figure  3C  and  3F). 

NR  expression  in  normal  tissue  predicts  survival  and  disease  progression 

We  also  examined  the  potential  prognostic  value  of  NR  expression  in 
histologically  normal  lung  tissue  obtained  from  areas  adjacent  to  the  tumors  of 
the  MDACC  cohort  used  in  the  above  studies.  When  the  normal  tissue 
expression  data  were  analyzed  using  the  RPART  model  and  validated  by 
LOOCV,  the  NR  signature  provided  statistically  significant  predictors  of  both 
disease  progression  (FIR=1 0.2,  P=0. 00003)  and  overall  patient  survival  (FIR=2.5, 
P=0.066)  (Figure  S2).  Interestingly,  reiteration  of  the  RPART  model  revealed  two 
NRs,  NR4A1  (also  known  as  nerve  growth  factor  induced  gene  B3  [NGFIB3]) 
and  mineralocorticoid  receptor  (MR,  or  NR3C2),  to  be  single-gene  predictors  for 
survival  and  disease  progression  that  yield  the  same  Kaplan-Meier  plots  as  those 
observed  when  using  the  all  48  NRs  (Figure  S2).  Although  the  prediction  models 
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for  normal  tissue  will  require  further  validation  in  an  independent  dataset,  this 
analysis  indicates  higher  expression  of  NGFIB3  and  MR  correlates  with  a  good 
prognosis.  Given  that  most  surgical  biopsies  include  both  normal  and  tumor 
tissue,  these  data  suggest  that  analyzing  NR  expression  profiles  from  tumor  and 
corresponding  normal  lung  epithelium  will  improve  the  clinical  utility  of  this 
approach. 

SHP  expression  predicts  early  stage,  high-risk  lung  cancer  patients 

Since  identification  of  early  stage,  high-risk  patients  is  perhaps  the  most  clinically 
useful  classifier  for  guiding  therapeutic  strategy,  we  tested  whether  a  specific  NR 
gene  signature  has  predictive  power  to  classify  stage  I  lung  cancer  patients  into 
the  high-  and  the  low-risk  groups.  Importantly,  expression  of  SHP  was  identified 
by  RPART  analysis  to  significantly  differentiate  high-risk  from  low-risk  stage  I 
patients  in  the  Consortium  samples  (Figure  4A,  P= 0.033),  whereas  the  PR 
signature  was  marginally  predictive  (Figure  4B,  P=0.069).  These  results  reveal 
SHP  to  be  the  first  known  single-gene  predictor  of  high-risk  patients  with  stage  1 
lung  cancer. 

DISCUSSION 

Several  recent  studies  using  microarray  experiments  have  proposed  various  sets 
of  genetic  signatures  for  lung  cancer  prognosis  (Beer  et  al.,  2002;  Chen  et  al., 
2007;  Endoh  et  al.,  2004;  Lu  et  al.,  2006;  Potti  et  al.,  2006;  Shedden  et  al.,  2008). 
Although  successfully  validated  in  independent  testing  sets,  the  gene  signatures 
from  these  studies  share  little  if  any  overlap  with  one  another  (Beer  et  al.,  2002; 
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Chen  et  al.,  2007;  Potti  et  al.,  2006).  Furthermore,  because  of  the  open-ended 
nature  of  genome-wide  analyses,  the  signatures  have  provided  little  insight  into 
the  pathogenesis  or  patholophysiology  of  lung  cancer.  To  date,  these  studies 
also  have  not  identified  any  new  therapeutic  targets.  Here,  we  report  a  rationally 
designed  lung  cancer  gene  expression  study  targeting  the  NRs,  a  class  of 
transcription  factors  that  are  known  to  govern  complex  physiologic  and 
pathophysiologic  processes,  and  are  themselves  the  targets  of  validated  drugs 
for  many  diseases  including  cancer.  This  family  also  includes  a  number  of 
orphan  receptors,  many  of  which  are  currently  being  evaluated  as  potential  new 
therapeutic  targets  for  a  number  of  diseases  (Shulman  and  Mangelsdorf,  2005). 

Our  analysis  revealed  several  findings  that  should  have  important  and 
practical  implications  for  the  use  of  the  NR  gene  signature  in  a  clinical  setting. 
First,  we  demonstrated  that  the  NR-superfamily  gene-expression  signature  is  an 
excellent  predictor  of  both  patient  survival  and  progression  of  lung  cancer.  We 
used  both  unsupervised  and  supervised  approaches  to  validate  the  prognostic 
potential  of  NRs  in  independent  experiments.  In  addition  to  validating  the 
predictive  power  of  the  entire  NR  superfamily  signature  as  a  whole,  PR  and  SHP 
were  identified  as  robust,  single  gene  predictors.  The  demonstration  of  PR  as  a 
predictive  marker  is  supported  by  a  previous  retrospective  study  where  PR  was 
shown  to  be  associated  with  survival  in  patients  with  lung  adenocarcinoma 
(Ishibashi  et  al.,  2005).  Expression  of  PR,  together  with  estrogen  receptor  a 
(ERa),  is  now  well  established  as  a  clinical  guide  to  both  prognostic  anticipation 
and  therapeutic  intervention  of  breast  cancer.  Indeed,  in  thinking  about  the  next 
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step  in  our  studies,  the  finding  that  certain  lung  cancers  express  specific,  known 
therapeutic  NR  targets  (e.g.,  PR,  ERa  and  ER|3,  AR,  RARs,  PPARs),  brings  up 
the  real  possibility  of  treating  these  patients  whose  tumors  express  these 
receptors  with  drugs  (agonists  or  antagonists)  that  target  the  receptors.  In  a  prior 
preclinical  study,  treatment  with  progesterone  inhibited  lung  tumor  xenograft 
growth  (Ishibashi  et  al.,  2005).  By  contrast,  anti-estrogen  therapy  is  being  tested 
as  a  lung  cancer  therapeutic  (Siegfried,  2006;  Stabile  et  al.,  2002;  Stabile  et  al., 
2005;  Traynor  et  al.,  2008);  and  in  a  mouse  lung  cancer  study,  the  use  of  a 
PPARy  agonist  had  a  synergistic  effect  at  reducing  tumor  burden  when  used  with 
cis-platinum  (Girnun  et  al.,  2007).  Evaluation  of  the  QPCR  profiles  from  our  study 
revealed  a  high  degree  of  patient-to-patient  variability  in  NR  expression  (Figure 
SI),  and  this  observation  provides  a  strong  rationale  for  using  this  approach  to 
guide  individualized  treatment  in  the  future.  A  reasonable  assumption  based  on 
our  work  here  is  that  predicting  sporadic  responses  to  drugs  like  anti-estrogens 
might  be  accomplished  by  screening  patients  for  NR  expression  using  the 
methodology  highlighted  in  this  study.  Similarly,  our  data  suggest  that  NR 
profiling  of  individual  tumors  provides  a  clinical  paradigm  for  identifying  potential 
responders  to  NR  drugs. 

A  second  finding  of  considerable  interest  was  that  the  orphan  nuclear 
receptor  SHP  is  also  a  prognostic  lung  cancer  biomarker,  particularly  of  early 
stage  disease.  To  our  knowledge  this  is  the  first  single  gene  predictor  for  high- 
risk  early-stage  lung  cancer  patients.  SHP  has  been  extensively  studied  for  its 
role  in  liver  lipid  metabolism  (Goodwin  et  al.,  2000;  Lu  et  al.,  2000),  and  as 
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transcriptional  repressor  of  other  NRs  (Nishizawa  et  al.,  2002).  Intriguingly,  a 
recent  report  found  SHP  expression  was  negatively  associated  with  liver 
tumorigenesis  in  a  mouse  model  (Zhang  et  al.,  2008).  These  findings  prompt 
further  exploration  into  whether  there  is  a  connection  between  the  known 
physiological  role  of  SHP  and  lung  tumorigenesis  or  whether  SHP  has  a  unique 
pathophysiologic  function  in  the  disease  pathogenesis.  To  that  end,  we  note  that 
FXR  agonists,  a  PPARy  agonist  (rosiglitazone),  agents  that  inhibit  HNF-1a 
action,  and  a  number  of  orphan  drugs  are  all  inducers  of  SHP  expression 
(Chanda  et  al.,  2008).  These  compounds  might  be  tested  in  preclinical  models  to 
see  if  they  inhibit  lung  tumorigenesis  or  malignant  behavior.  Also  germline 
mutations  in  SHP  or  polymorphisms  in  FXR  that  regulate  the  level  of  SHP 
expression  could  play  a  role  in  SHP  function. 

A  third  noteworthy  finding  from  our  study  was  the  ability  to  predict  overall 
survival  based  on  NR  expression  in  normal  tissue  of  patients  with  lung  cancer. 
Whether  this  may  be  due  to  a  “field  effect”  of  the  nearby  cancer  or  to  some  pre¬ 
existing  nature  of  normal  lung  epithelium  is  not  yet  known.  However,  this  finding 
does  suggest  that  interrogating  the  histologically  normal  tissue  may  yield  insight 
into  lung  cancer  oncogenesis.  To  that  end  we  note  that  the  prognostic  NR 
signature  in  normal  tissue  is  completely  different  than  that  of  the  adjacent  tumor. 
In  contrast  to  that  observed  in  tumors,  distillation  of  the  NR  signature  using 
RPART  analysis  revealed  that  NGFIB3  (NR4A1)  and  MR  are  single  gene 
biomarkers  found  in  normal  tissue  for  predicting  disease  progression  and  overall 
survival,  respectively.  NR4A  family  members  have  been  shown  to  be  tumor 
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suppressors  in  a  mouse  model  of  myeloid  leukemogenesis  (Mullican  et  al., 

2007).  Similarly,  underexpression  of  MR  has  been  shown  to  be  correlated  to 
colorectal  carcinoma  progression  (Fabio  et  al.,  2007).  These  studies  support  the 
notion  that  higher  expression  of  NR4A1  and  MR  might  play  a  protective  role 
against  lung  tumor  pathogenesis. 

A  fourth  finding  of  our  study  was  the  independent  demonstration  that  the 
NR  gene  signature  could  be  tested  and  cross-validated  using  two  different  gene 
expression  platforms,  QPCR  and  microarray.  Given  that  microarray  data  do  not 
have  the  dynamic  or  quantitative  properties  of  data  generated  by  QPCR,  the 
cross-validation  of  the  NR  gene  signature  between  different  platforms  strongly 
supports  the  idea  that  the  NR  superfamily  may  be  a  powerful  prognostic  predictor 
that  also  is  functionally  involved  in  lung  cancer  pathophysiology.  In  any  case  it 
seems  clear  that  a  combination  of  a  more  robust  collection  process 
(microdissection  vs.  tissue  mass)  together  with  more  quantitative  measurements 
(QPCR  vs.  microarray)  may  reduce  variability  and  strengthen  the  data.  Indeed, 
the  95%  confidence  intervals  of  hazard  ratios  for  both  PR  and  SHP  genes  from 
the  30  patient  dataset  were  smaller  than  that  from  Consortium  data  (with  sample 
size  442)  (Figure  3C  and  3F).  Hazard  ratios  of  the  high  risk  vs.  low  risk  group, 
defined  using  unsupervised  cluster  results,  were  also  higher  for  the  30  patient 
dataset  (Figure  IB)  compared  to  the  consortium  data  (Figure  S4).  Thus,  while 
labor  intensive,  improving  sample  homogeneity  and  the  quality  of  the  expression 
data  is  likely  to  provide  more  reliable  prognostic  information. 
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Finally,  the  NR  expression  profile  provides  specific,  testable  hypotheses 
on  the  role  of  the  NRs  in  lung  cancer  pathogenesis.  For  example,  blocking  the 
function  of  a  highly  expressed  tumor  cell  NR  could  inhibit  tumor  growth  or 
development,  while  overexpressing  a  low  abundance  tumor  cell  NR  could  test  its 
tumor  suppressive  capability.  Surprisingly,  interrogating  the  non-neoplastic  tissue 
within  the  vicinity  of  the  tumor  also  provided  an  NR  gene  signature  that  was 
predictive  for  survival.  Thus,  NR  expression  in  normal  lung  epithelium  provides 
the  basis  for  testing  NR  function  in  the  airway  field  where  lung  tumors  develop.  A 
goal  of  future  studies  will  be  to  determine  whether  the  NR  signature  is  innate  to 
the  normal  tissue  or  whether  this  expression  signature  has  been  affected  by  its 
proximity  to  the  tumor.  Perhaps  one  of  the  most  surprising  observations  from  this 
study  is  that  an  NR  signature  has  not  appeared  in  the  prognostic  signatures 
obtained  in  any  of  the  previous  global  gene-expression  studies.  This  is  true  in 
spite  of  the  fact  that,  at  least  in  the  multi-site  consortium  database  we  analyzed, 
excerpting  just  the  NR  expression  information  yielded  a  predictive  NR  gene 
signature  that  was  not  discovered  using  global  gene  analysis  (Shedden  et  al., 
2008).  Thus,  our  study  provides  a  strategic  rationale  for  using  an  informed, 
candidate  gene  profiling  approach  to  identify  prognostic  markers,  and  interrogate 
specific  gene  families  that  may  play  roles  in  the  cancer  biology. 
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EXPERIMENTAL  PROCEDURES 


Collection  of  primary  tissue  samples 

Thirty  primary  tumor  and  corresponding  normal  tissues  (including  23 
adenocarcinomas  and  7  squamous  cell  carcinomas)  were  obtained  by  surgical 
resection  under  approval  of  the  institutional  review  boards  at  MD  Anderson 
Cancer  center.  Sixteen  patients  were  diagnosed  with  stage  I  disease,  five 
patients  with  stage  II  disease,  five  patients  with  stage  III  disease,  and  four 
patients  with  stage  IV  disease.  The  clinical  data  on  each  of  the  30  patients  are 
given  in  Table  S3.  All  tissues  were  stored  at  -80  °C  after  being  snap  frozen  in 
liquid  nitrogen.  Serial  sectioning  of  each  sample  was  used  to  histologically 
evaluate  tumor  and  normal  tissue  for  subsequent  microdissection  (Maitra  et  al., 
2001).  RNAs  were  isolated  from  each  sample  using  the  Qiagen  RNeasy  Mini  Kit 
(Quiagen  Sciences,  MD). 

Reverse  Transcription  and  Quantitative  Real-time  PCR  assay 

All  cDNAs  were  prepared  for  quantitative  real-time  PCR  (TaqMan®  method)  as 
described(Bookout  et  al.,  2006).  Briefly,  2  pg  of  total  RNA  was  DNAse-treated 
with  2  U  of  DNAse  I  in  final  volume  20  pi  containing  4.2  pM  MgCh.  The  reverse 
transcription  reaction  was  performed  in  100  pi  final  volume,  followed  by  addition 
of  100  pi  of  DEPC-H20.  Human  universal  cDNAs  for  broadly  expressed  NRs  or 
tissue  specific  cDNAs  for  restricted  expression  NRs  was  used  to  construct  a 
standard  curve  of  the  following  concentrations:  no  template  control  (NTC),  0.008, 
0.04,  0.2,  1,  5,  25  ng  for  18S  RNA;  and  NTC,  0.016,  0.08,  0.4,  2.0,  10,  50  ng  for 
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each  NR  RNA.  These  quantities  are  based  on  the  RNA  concentration  used  for 
the  reverse  transcription  reaction.  A  negative  reverse  transcription  sample  and  a 
control  for  genomic  DNA  contamination  were  included  for  both  18S  and  NR.  Per 
sample,  10  ng  of  cDNA  was  assayed  in  triplicate  wells  of  a  384-well  plate.  The 
final  forward  and  reverse  primer  concentrations  used  were  75  nM  for  18S  rRNA 
and  300  nM  for  all  NRs.  For  this  study  the  48  NRs  plus  the  two  common  splice 
variants  for  PPARy  (i.e. ,  PPARy2)  and  PPAR5  (i.e. ,  PPAR52)  were  included  in 
the  analysis  of  all  samples  from  the  MDACC  patient  set.  Primer  sequences  have 
been  reported  elsewhere  (www.NURSA.org). 

QPCR  data  analysis 

Data  were  imported  into  Microsoft  Excel®  and  evaluated  for  PCR  efficiency  (e),  e 
=  I0["1/slope]  where  the  slope  was  obtained  from  the  standard  curve  calculated  by 
the  sequence  detection  system  software  of  the  ABI7900  instrument  for  the 
endogenous  18S  reference  and  target  NR.  Relative  mRNA  amounts  were 
calculated  by  quantity  =  (e) _ct.  The  calculated  quantities  were  averaged  ( avg ), 
and  the  standard  deviations  ( stdev )  and  coefficients  of  variation  ( CV=stdevi avg ) 
were  determined  for  the  18S  and  NR  of  each  sample.  Data  points  that  showed 
>17%  CV  were  considered  outliers  and  removed.  Normalized  values  for 
expression  of  each  NR  were  calculated  using  normalized  value  =  NR  quantity 
avg  II 8S  quantity  avg.  The  standard  deviation  of  the  normalized  value  was 
calculated  as  (normalized  value)  x  [(CV  of  reference)2  +  (CV  of  gene  of 
interest)2]’72.  Normalized  values  are  represented  as  a  bar  graph.  All  these  QPCR 
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data  analysis  procedures  are  predefined  and  the  same  as  in  previous 
publications.  The  entire  QPCR  dataset  of  NR  expression  in  normal  and  tumor 
samples  from  the  30  patient  cohort  is  available  in  Figure  SI  and  online  at 
www.NURSA.org. 

Microarray  data  preprocessing 

Consortium  microarray  raw  data  (Shedden  et  al.,  2008)  were  downloaded  from 
National  Cancer  Institute’s  caArray  database  and  preprocessed  by  RMA 
background  correction  and  Quantile-Quantile  normalization  (Bolstad  et  al.,  2003). 
All  gene-expression  values  were  log-transformed  (on  a  base  2  scale). 

Unsupervised  clustering  analysis 

The  hierarchical  clustering  algorithm  (Garzotto  et  al.,  2005)  was  used  to  group 
NR  expression  versus  the  30  MDACC  patient  cohort  based  on  the  QPCR 
expression  profile.  Gene  expression  values  were  log-transformed  (base  2  scale) 
in  a  manner  similar  to  the  transformation  of  the  microarray  data.  Euclidian 
distance  and  average  link  were  used  in  the  hierarchical  clustering  algorithm. 

Supervised  classification  using  Recursive  Partitioning 

Supervised  classification  was  performed  using  Recursive  Partitioning  and 
Regression  Trees  (Hess  et  al.,  1999),  a  widely  used  classification  method  in 
biomedical  research  (Garzotto  et  al.,  2005;  Hess  et  al.,  1999;  Koziol  et  al.,  2003; 
Valera  et  al.,  2007).  Recursive  partitioning  is  a  nonparametric  method  and  does 
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not  make  distribution  assumptions  for  the  predictor  variables.  The  algorithm  itself 
is  simple  and  intuitive.  At  each  step,  the  recursive  partitioning  program 
determines  for  each  variable  (in  this  case  for  each  of  the  NR  genes)  a  cutoff  point 
that  best  splits  all  of  the  individuals  into  low  risk  and  high  risk  groups  and  selects 
the  variable  that  performs  best.  Next,  the  process  is  repeated  on  each  of  the 
resulting  subpopulations.  The  iteration  will  stop  until  either  a  subpopulation 
contains  one  class  of  individuals  or  the  subpopulation  is  too  small  to  subdivide.  In 
this  study,  the  response  variable  in  the  recursive  partitioning  model  was  the 
survival  time,  either  overall  survival  or  progression-free  survival;  the  co-variables 
in  the  model  are  all  NR  genes.  The  program  RPART,  a  freely  available  R 
package  (RDC  Team,  2008),  was  implemented  to  generate  the  decision  tree.  All 
parameters  were  used  as  the  default  values  set  in  the  package.  The  relative  risk 
of  each  individual  patient  (relative  to  the  overall  population  in  the  training  data) 
was  predicted  from  the  tree  model.  The  patients  with  predicted  relative  risk 
greater  than  one  were  considered  as  high-risk  group,  and  otherwise  as  low-risk 
group.  In  our  analysis,  there  was  no  gene  selection  step  before  model  building 
and  all  the  parameters  used  in  the  prediction  model  were  predefined  as  the 
default  value  in  R  program;  therefore,  the  testing  data  were  not  used  for  the 
model  building  procedure,  similar  to  a  blinded  testing  procedure.  In  order  to 
explore  the  roles  of  individual  NRs  or  subsets  of  NRs  in  prediction  models,  we 
looked  into  the  tree  structure  of  prediction  models  and  found  that  PR  was  the 
only  co-variable  left  in  the  prediction  model  built  from  the  30-patient  MDACC  data 
set.  In  order  to  see  the  prognosis  ability  of  other  NR  genes,  we  removed  PR  from 
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the  prediction  model  and  identified  that  SHP  as  single  gene  signature  also  has 
prognosis  ability.  We  further  explored  whether  a  subset  of  NR  genes  can  have 
better  prognosis  than  single  gene  signature  in  MDACC  data  set.  We  identified  a 
three-NR  gene  signature  by  changing  the  parameter  minsplit  (the  minimum 
number  of  observations  that  must  exist  in  a  node  in  order  for  a  split  to  be 
attempted)  to  10  in  RPART  function,  and  found  that  the  prognosis  results  are 
similar  to  using  PR  one  gene  signature.  Note,  by  default,  minsplit  parameter 
equals  to  20,  which  is  used  in  all  rest  of  the  study.  All  tree  structure  and 
parameters  used  in  the  prediction  model  can  be  found  in  the  supplemental 
material. 

Survival  analysis 

Overall  survival  time  was  calculated  from  the  date  of  surgery  until  death  or  the 
last  follow-up  contact.  Progression-free  survival  was  defined  as  the  time  interval 
between  the  date  of  surgery  and  the  date  of  disease  recurrence  or  death  from 
any  cause,  whichever  came  first,  or  date  of  last  follow-up  evaluation.  Survival 
curves  were  estimated  using  the  product-limit  method  of  Kaplan-Meier  (Kaplan, 
1958)  and  were  compared  using  the  log-rank  test.  Univariate  Cox  proportional- 
hazards  analysis  (Collett,  2003)  was  also  performed,  with  survival  as  the 
dependent  variable.  Two-sided  P  values  of  less  than  0.05  were  considered  to 
indicate  statistical  significance. 

Sweave  report 


Jeong  et  al.,  Page  23 


A  Sweave  document  is  enclosed  in  the  Supplemental  Data  online  to  permit 
others  to  reproduce  any  or  all  parts  of  our  statistical  analyses  report.  Sweave  is  a 
literate  programming  R  package  that  combines  the  source  code  (in  R)  and 
documentation  (in  LaTeX)  in  one  file  and  thereby  permits  reproduction  of 
published  high-throughput  data  analysis  (Coombes  et  al.,  2007;  Gentleman, 
2005;  Lamport,  1994). 


SUPPLEMENATAL  DATA 

The  Supplementary  Data  include  4  tables,  4  figures,  and  one  methods  (Sweave) 
document  and  can  be  found  with  this  article  online. 
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FIGURE  LEGENDS 


Figure  1.  QPCR  analysis  of  the  NR  gene-expression  signature  in  lung 
cancer  patients 

(A)  Unsupervised  cluster  analysis  of  the  30  MDACC  lung  cancer  patient  cohort 
using  the  QPCR  profile  of  the  NR  superfamily.  Vertical  and  horizontal  axes 
represent  NR  and  lung  cancer  patient  clusters,  respectively. 

(B  and  C)  Kaplan-Meier  plots  showing  the  association  of  the  NR  gene  signature 
with  overall  patient  survival  (B)  and  disease  progression  (C).  P-values  were 
obtained  using  the  log-rank  test.  Red  color  represents  sample  Cluster  I  and  blue 
color  represents  Cluster  II  defined  by  an  unsupervised  clustering  algorithm  using 
NR  gene  profiling  data.  O  indicates  censored  samples.  ADC,  adenocarcinoma; 
SCC,  squamous  cell  carcinoma. 

Figure  2.  Kaplan-Meier  plots  showing  the  predictive  power  of  the  NR  gene 
signature  in  training  and  testing  sets  from  multiple  institutions 

(A)  LOOCV  of  the  recursive-partitioning  tree  model  (RPART)  for  the  30-sample 
MDACC  QPCR  dataset.  The  hazard  ratio  (HR)  for  the  predicted  high-risk  vs.  the 
predicted  low-risk  signatures  was  7.03;  95%  Cl,  2.22  to  22.3;  P=0.00015. 

(B  and  C)  Independent  validation  of  the  NR  gene-expression  signature  between 
the  MDACC  cohort  and  consortium  cohort.  The  MDACC  cohort  (n=30)  training 
set  was  tested  in  the  consortium  cohort  (n=442)  (B),  and  vice  versa  (C). 
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(D)  Independent  validation  of  the  NR  gene  signature  in  the  442-sample  cohort 
multi-institute  consortium  using  RPART  analysis.  The  microarray  datasets  were 
divided  into  two  groups,  one  for  the  training  and  the  other  for  the  testing  cohort. 
P-values  were  obtained  by  the  log-rank  test.  Red  and  black  lines  represent 
predicted  high-  and  low-risk  groups,  respectively.  O  indicates  censored  samples. 

Figure  3.  Identification  of  single-NR  gene  biomarkers  for  lung  cancer 
prognosis 

(A  and  B,  D  and  E)  Kaplan-Meier  survival  plots  using  single  gene  prediction 
model  of  PR  (A  and  B)  and  SHP  (D  and  E).  The  MDACC  cohort  was  tested  using 
LOOCV  (A  and  D),  or  used  as  a  training  set  and  independently  tested  in  the 
multi-site  consortium  cohort  (B  and  E).  P-values  were  obtained  using  log-rank 
test.  Red  and  black  lines  represent  high-  and  low-risk  groups,  respectively.  O 
indicates  censored  samples. 

(C  and  F)  Hazard  ratios  from  univariate  Cox  regression  model  for  PR  and  SHP 
expression,  respectively,  in  the  MDACC  and  multi-site  consortium  data  sets. 

Figure  4.  Kaplan-Meier  survival  plots  showing  single  NR  gene  predictors  in 
stage  I  lung  cancer  patients 

(A  and  B)  Predictive  models  for  SHP  (A)  and  PR  (B)  were  trained  in  the  MDACC 
samples  and  tested  in  the  stage  I  lung  cancer  patients  of  the  consortium  cohort. 
P-values  were  obtained  by  log-rank  test.  Red  and  black  lines  represent  predicted 
high-  and  the  low-risk  groups,  respectively.  O  indicates  censored  samples. 
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Table  1.  Patient  demographics  summarized  by  unsupervised 
cluster  analysis  of  lung  tumors _ 


Cluster  1 

Cluster  2 

P-valuef 

Sample  size 

13 

16 

Age  (mean  ±  s.e.) 

62.6±2.4 

63±2.1 

0.902 

Gender  (%  female) 

38% 

56% 

0.340 

Race  (%  non-white) 

0% 

13% 

0.187 

Histology  (ADC/SCC) 

8/5 

13/3 

0.238 

Stage  1 

62% 

56% 

0.730 

Stage  II 

8% 

19% 

Stage  III 

15% 

19% 

Stage  IV 

15% 

6% 

Death  rate 

85% 

25% 

0.001 

Disease  progression  rate 

92% 

63% 

0.062 

Smokers 

15% 

13% 

0.823 

Adjuvant  Therapy 

15% 

6% 

0.422 

tindicates  P-values  by  t-test  for  Age  and  by  Fisher’s  exact  test  for  other  variables 
comparing  cluster  1  and  2. 

ADC,  adenocarcinoma;  SCC,  squamous  cell  carcinoma. 
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Table  2.  Death  hazard  ratios  (HR)  from  multivariate  Cox  regression  analysis 


from  two  independent  datasets 


MSK  and  CAN/DF  Dataset 

Total  Consortium  Dataset 

Variable 

HR 

P- value 

HR 

P- value 

Gender 

1.88 

0.019 

1.34 

0.07 

Age  at  diagnosis 

1.02 

0.22 

1.04 

2.4e-07 

Adjuvant 

chemotherapy 

2.02 

0.016 

1.22 

0.34 

Adjuvant  radiation 
therapy 

1.48 

0.210 

1.50 

0.059 

Stage 

2.76 

0.00046 

3.05 

8.5e-1 1 

NR  signature* 

1.98 

0.037 

1.89 

9.9e-05 

*The  NR  signature  for  the  MSK  and  CAN/DF  dataset  (n=186)  was  derived  based 
on  the  prediction  model  built  from  the  University  of  Michigan  Cancer  Center  and 
Moffitt  Cancer  Center  Consortium  training  dataset.  The  NR  signature  for  the  total 
Consortium  dataset  (n=442)  was  derived  based  on  the  prediction  model  built 
from  the  MDACC  dataset. 

MSK,  Memorial  Sloan-Kettering  Cancer  Center;  CAN/DF,  Dana-Farber  Cancer 
Institute. 
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Figure  SI.  Expression  profiles  of  the  NR  superfamily  in  lung  tissues. 

Quantitative  real-time  PCR  analysis  was  performed  for  48  NRs  (including  2  common  splice  variants  each 
for  PPARyand  PPAR6)  in  30  pair-matched  tissues  (normal  and  tumor)  from  lung  cancer  patients.  Relative 
expression  values  were  obtained  as  described  in  Methods.  Ct  >  34  was  scored  as  below  detection.  Open 
and  filled  bars  represent  normal  and  pair-matched  tumor  tissues  from  each  patient,  respectively.  The 
patients  are  numbered  from  1-30  (see  Table  S2)  and  grouped  according  to  gender  and  survival  status 
with  each  patient  being  in  the  same  position  for  each  NR  dataset. 
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Figure  SI  (continued).  Expression  profiles  of  the  NR  superfamily  in  lung  tissues. 

Quantitative  real-time  PCR  analysis  was  performed  for  48  NRs  (including  2  common  splice  variants 
each  for  PPARy  and  PPAR6)  in  30  pair-matched  tissues  (normal  and  tumor)  from  lung  cancer  patients. 
Relative  expression  values  were  obtained  as  described  in  Methods.  Ct  >  34  was  scored  as  below 
detection.  Open  and  filled  bars  represent  normal  and  pair-matched  tumor  tissues  from  each  patient, 
respectively.  The  patients  are  numbered  from  1-30  (see  Table  S2)  and  grouped  according  to  gender 
and  survival  status  with  each  patient  being  in  the  same  position  for  each  NR  dataset. 
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Figure  SI  (continued).  Expression  profiles  of  the  NR  superfamily  in  lung  tissues. 

Quantitative  real-time  PCR  analysis  was  performed  for  48  NRs  (including  2  common  splice  variants 
each  for  PPARy  and  PPAR8)  in  30  pair-matched  tissues  (normal  and  tumor)  from  lung  cancer  patients. 
Relative  expression  values  were  obtained  as  described  in  Methods.  Ct  >  34  was  scored  as  below 
detection.  Open  and  filled  bars  represent  normal  and  pair-matched  tumor  tissues  from  each  patient, 
respectively.  The  patients  are  numbered  from  1-30  (see  Table  S2)  and  grouped  according  to  gender 
and  survival  status  with  each  patient  being  in  the  same  position  for  each  NR  dataset. 
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Figure  S2.  Identification  of  NRs  as  prognostic  biomarkers  in  normal  lung  tissue 
from  lung  cancer  patients. 

(A  and  B)  Kaplan-Meier  plots  of  time  to  progression  and  survival  are  shown  for  NGFI-B 
and  MR,  respectively.  Note  that  these  two  plots  are  identical  to  those  obtained  when 
using  the  entire  48  NR  gene  set.  (A)  LOOCV  of  recursive-partitioning  tree  model  of  the 
MDACC  QPCR  data  in  normal  tissues  shows  that  NGFI-B  is  the  single  gene  left  in  the 
predictive  model  for  disease  progression  (HR=10.2,  95%  Cl  2.8  to  37.1;  P=0. 00003). 
(B)  Similar  LOOCV  analysis  shows  MR  is  a  single-gene  predictor  of  the  entire  48  NR 
gene  set  as  associated  with  patient  survival  (HR=2.5,  95%  Cl  0.91  to  6.6;  P= 0.066). 
Red  and  black  lines  represent  high-  and  low-risk  groups,  respectively.  Open  circles  indi¬ 
cate  censored  samples. 
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Figure  S3.  Kaplan-Meier  estimates  of  survival  time  based  on  NR  expression  when 
clinical  variables  are  included  in  the  analysis. 

The  microarray  dataset  from  the  four  institute  Consortium  was  divided  into  two  groups, 
one  for  the  training  cohort  and  the  other  for  the  testing  cohort.  We  included  48  NR 
expression  variables  and  clinical  variables  including  gender,  age,  stage,  treatments  (i.e., 
those  receiving  adjuvant  chemo-therapy  or  not,  and  those  receiving  adjuvant  radiation 
therapy  or  not)  as  co-variables  in  RPART  predictive  model.  The  final  predictive  tree 
structure  can  be  seen  in  the  Sweave  report  (Supplemental  Data  online).  The  predictive 
model  was  built  in  the  training  cohort  and  then  validated  in  the  testing  cohort.  In  the 
testing  cohort,  patients  in  the  predicted  high  risk  group  live  significantly  shorter  than 
patients  in  the  predicted  low  risk  group,  (HR  =  1.43;  95%  Cl,  0.9  to  2.26;  P=0.13).  P- 
values  were  obtained  by  log-rank  test.  Red  and  black  lines  represent  predicted  highl¬ 
and  low-risk  groups,  respectively.  Open  circles  indicate  censored  samples. 
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Figure  S4.  Kaplan-Meier  plots  of  survival  time  of  the  Consortium  cohort  based  on 
the  48  NR  expression  signatures. 

Unsupervised  hierarchical  cluster  analysis  of  the  microarray  signature  of  the  48  NRs 
divides  the  442  Consortium  samples  into  two  clusters.  The  patients  in  these  two  clus¬ 
ters  have  significantly  different  survival  times.  P-values  were  obtained  using  the  log- 
rank  test.  Red  and  black  colors  were  defined  by  unsupervised  clustering  algorithm 
using  NR  gene  signature.  Open  circles  indicate  censored  samples. 
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Table  SI.  Summary  of  patient  clinical 
information. 


Feature 

Cohort  (n=30) 

Age  (y) 

Median 

67 

Range 

44.0-77.7 

Mean 

63.3 

Gender 

Female 

15 

Male 

15 

Race 

White 

28 

Black 

1 

Asian 

1 

TNM  Stage 

1 

17 

II 

4 

III 

5 

IV 

4 

Tumor  type 

ADC 

22 

SCC 

8 

Survival 

Dead 

16 

Female 

7 

Male 

9 

Alive 

14 

Female 

8 

Male 

6 

Smoking  historyf 

No 

4 

Yes 

26 

Adjuvant  therapy 

No 

27 

Yes 

3 

Abbreviations:  ADC,  adenocarcinoma; 
SCC,  squamous  cell  carcinoma;  TNM, 
tumor  size,  node  involvement,  metastasis 
status. 

t,  Patients  who  had  smoked  at  least  100 
cigarettes  in  their  lifetime  were  defined  as 
smokers. 


Table  S2.  C 

inical  information  on  individua 

patients. 

Sample  ID 

Date  of 
surgery 

Sex 

Race 

DOB 

Tobacco 

history 

■ 

Pathology 

(TNM) 

Stage 

Last  contact/ 
vital  status 

Date  of 

recurrence 

N.A. 

therapy 

1 

847-ADC 

9/22/01 

F 

W 

9/2/41 

Yes/current 

35 

T2,N0,M0 

IB 

10/8/01  (D) 

None 

No 

2 

773-ADC 

5/21/01 

F 

W 

9/30/23 

Yes/current 

50 

T4,N2,M1 

IV 

5/10/02  (D) 

11/9/01 

No 

3 

848-ADC 

9/27/01 

F 

W 

7/2/53 

Yes/current 

45 

T2,N0,M0 

IB 

9/21/03  (D) 

None 

No 

4 

801  -SCC 

8/3/01 

F 

W 

3/6/38 

Yes/current 

60 

T4.N1.M1 

IV 

12/28/03  (D) 

None 

No 

5 

845-ADC 

9/19/01 

F 

W 

4/18/43 

Yes/former 

35 

T2.N1.M0 

IIB 

10/14/05  (D) 

1/2/04 

No 

6 

947-ADC 

8/30/02 

M 

W 

5/13/34 

Yes/current 

75 

T4,N0,M0 

NIB 

12/2/02  (D) 

10/7/02 

No 

7 

758-SCC 

4/28/01 

M 

W 

10/15/31 

Yes/former 

8 

T2,N2,M0 

IIIA 

10/1/01  (D) 

9/17/01 

No 

8 

857-SCC 

10/24/01 

M 

W 

9/19/30 

Yes/current 

80 

T2,N0,M0 

IB 

6/11/02  (D) 

None 

No 

9 

919-ADC 

6/27/02 

M 

W 

12/6/32 

Yes/current 

70 

T2,N0,M0 

IB 

5/17/03  (D) 

None 

No 

10 

878-SCC 

11/19/01 

M 

W 

2/13/39 

Yes/former 

43 

T4,N2,M0 

NIB 

12/23/02  (D) 

Unknown 

Yes 

11 

877-ADC 

12/12/01 

M 

W 

9/23/44 

Yes/current 

75 

T2.N1.M0 

IIB 

5/9/04  (D) 

6/13/03 

No 

12 

797-ADC 

7/26/01 

W 

5/25/32 

Yes/current 

77 

T2,N0,M0 

IB 

2/26/05  (D) 

None 

No 

13 

896-ADC 

2/11/02 

F 

W 

12/23/52 

Yes/current 

30 

T1,N0,M0 

IA 

5/5/05  (A) 

4/1/04 

No 

14 

922-ADC 

7/18/02 

F 

As 

12/4/56 

No 

0 

T1.N0.M0 

IA 

5/6/08  (A) 

1/26/06 

No 

15 

799-SCC 

7/30/01 

F 

W 

1/4/40 

Yes/current 

100 

T2,N0,M0 

IB 

7/13/05  (D) 

5/5/03 

No 

16 

778-ADC 

6/7/01 

F 

W 

4/18/43 

Yes/former 

20 

T2,N0,M0 

IB 

3/27/08  (A) 

6/9/06 

No 

17 

764-ADC 

5/15/01 

F 

W 

5/16/43 

Yes/current 

105 

T2,N0,M0 

IB 

4/12/07  (A) 

None 

No 

18 

781  -SCC 

6/18/01 

F 

W 

10/20/25 

Yes/current 

56 

T1.N0.M0 

IA 

2/16/07  (A) 

8/15/04 

No 

19 

803-ADC 

8/7/01 

F 

B 

5/14/36 

Yes/current 

50 

T2.N1.M0 

IIB 

4/24/08  (A) 

7/2/02 

No 

20 

739-ADC 

3/2/01 

F 

W 

1/14/28 

Yes/former 

60 

T2,N2,M0 

IIIA 

7/3/06  (D) 

2/25/05 

No 

21 

737-ADC 

3/1/01 

F 

W 

9/9/27 

Yes/current 

40 

T1.N0.M0 

IA 

3/31/08  (A) 

None 

No 

22 

749-ADC 

8/22/05 

F 

W 

5/3/35 

No 

0 

T2,N0,M0 

IB 

3/20/08  (A) 

None 

No 

23 

795-ADC 

6/29/01 

M 

W 

5/8/37 

No 

0 

T1.N0.M0 

IA 

1/11/08  (A) 

None 

No 

24 

794-SCC 

7/18/01 

M 

W 

9/13/34 

Yes/current 

100 

T2.N1.M0 

IIB 

5/23/07  (A) 

None 

No 

25 

792-ADC 

7/12/01 

M 

W 

1/11/34 

Yes/current 

100 

T1.N0.M0 

IA 

4/8/08  (D) 

1/11/05 

No 

26 

798-ADC 

7/30/01 

M 

W 

10/9/42 

Yes/current 

84 

T2,N0,M1 

IV 

6/19/08  (A) 

None 

No 

27 

818-ADC 

7/10/01 

M 

W 

8/8/57 

No 

0 

T4.N1.M1 

IV 

11/10/07  (D) 

8/13/02 

Yes 

28 

756-SCC 

4/19/01 

M 

W 

6/2/45 

Yes/current 

85 

T1.N2.M0 

IIIA 

5/23/07  (A) 

None 

No 

29 

782-ADC 

6/21/01 

M 

W 

10/21/32 

Yes/current 

150 

T1.N0.M0 

IA 

5/9/06  (D) 

10/29/01 

No 

30 

914-ADC 

7/11/02 

M 

W 

5/22/36 

Yes/former 

80 

T2,N0,M0 

IB 

10/22/07  (A) 

3/28/06 

No 

Abbreviations:  ADC,  adenocarcinoma;  SCC,  squamous  ce 

1  carcinoma;  W,  w 

hite;  B,  1 

clack;  As,  asian;  TNM,  tumor  size, 

node  involvement,  metastasis  status;  A,  alive;  D,  dead. 


Table  S3.  Summary  of  NR  expression  data  in  normal  and  tumor  lung  tissue 
taken  from  lung  cancer  patients. 


Broadly  expressed  in 
both  normal  and  tumor 
(n=22) 

Selectively  expressed  in  either 
tumor  or  normal  tissue 

Low  to 

undetectable* 
expression  in 
normal  and 
tumor  (n=3) 

Tumor 

(n=7) 

Normal  (n=18) 

ERa 

RARy 

COUP-TFy 

AR 

PPARy 

CAR 

ERRa 

REV-ERBa 

DAX-1 

COUP-TFa 

PPARy2 

ERR|3 

GCNF 

REV-ERB|3 

ER|3 

C0UP-TF|3 

PR 

PXR 

GR 

RORa 

HNF4a 

erry 

R0R|3 

LXRa 

RXRa 

HNF4Y 

FXR 

rory 

LXRp 

RXRp 

TLX 

LRH-1 

rxry 

PPARa 

TR2 

SF1 

MR 

SHP 

PPAR6 

TR4 

NGFIB3 

PPAR62  TRa 

N0R1 

RARa 

TR|3 

NURR1 

RAR|3 

VDR 

PNR 

*  Ct  >  34. 

Raw  primary  data  can  be  found  at  www.NURSA.org. 


Table  S4.  Correlation  of  NR  expression 


between  tumor  and  normal  1 

tissue. 

Nuclear 

Receptors 

Pearson 

Correlation 

Rvalue 

REV-ERBp* 

0.96 

5.47E-17 

RXRp* 

0.85 

2.42E-09 

COUP-TFp* 

0.76 

9.15E-07 

PPAR62* 

0.76 

1.15E-06 

NGFI-B3* 

0.76 

1.33E-06 

RARp* 

0.76 

1.39E-06 

RORy* 

0.74 

2.81  E-06 

TR2* 

0.73 

4.51  E-06 

PPARa* 

0.70 

1.97E-05 

PPARy* 

0.69 

2.05E-05 

TLX* 

0.69 

2.25E-05 

RARy* 

0.68 

3.08E-05 

LXRp* 

0.67 

5.39E-05 

TR4* 

0.67 

5.72E-05 

ERRa* 

0.64 

0.000162 

RORa* 

0.63 

0.000192 

RXRa* 

0.62 

0.000291 

REV-ERBa* 

0.58 

0.000838 

GR 

0.57 

0.001004 

COUP-TFa 

0.57 

0.001103 

TRa 

0.56 

0.001409 

COUP-TFy 

0.54 

0.001869 

SHP 

0.54 

0.002036 

PPAR6 

0.52 

0.003233 

RORp 

0.52 

0.003413 

LXRa 

0.51 

0.004302 

PXR 

0.49 

0.006065 

AR 

0.49 

0.006387 

VDR 

0.48 

0.00667 

TRp 

0.48 

0.007836 

PPARy2 

0.43 

0.017747 

MR 

0.43 

0.018366 

RARa 

0.43 

0.018471 

NOR1 

0.40 

0.029223 

ERRp 

0.39 

0.032868 

NURR1 

0.37 

0.045347 

LRH-1 

0.33 

0.077539 

PR 

0.33 

0.078355 

PNR 

0.31 

0.092401 

FXR 

0.26 

0.169483 

GCNF 

0.26 

0.172028 

ERRy 

0.25 

0.188439 

ERp 

0.24 

0.208889 

HNF4y 

0.23 

0.214435 

HNF4a 

0.11 

0.557613 

rxry 

0.11 

0.567836 

ERa 

0.04 

0.821723 

SF-1 

-0.06 

0.748715 

DAX-1 

-0.14 

0.473384 

CAR 

-0.15 

0.437427 

QPCR  expression  levels  for 

NRs  from  the  30 

pairs  of  normal  and  tumor  lung  samples  were 
determined  and  a  Pearson  correlation 
coefficient  calculated  with  an  associated  P 
value.  Statistical  significant  values  (P< 0.001 
after  multiple  testing  correction)  are  noted  with 
an  asterisk. 
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Nuclear  Receptor  Expression  Profiling  Defines  a 
Set  of  Prognostic  Biomarkers  for  Lung  Cancer 

Yangsik  Jeong,  Yang  Xie,  Guanghua  Xiao,  Xian-Jin  Xie, Carmen  Behrens,  Luc  Girard, 
Edward  F.  Patz,  Jr.,  Ignacio  I  Wistuba,  John  D  Minna  &  David  J  Mangelsdorf 


Sweave  Report 

Include  R  library 

>  library (survival) 

>  library (r part) 

function  for  showing  p  values  in  figures 

>  pv.expr  <-  function (x,  digits  =  1)  { 

+  if  (!x) 

+  return(O) 

+  exponent  <-  floor (loglO(x)) 

+  base  <-  round  (x/1  O'' exponent ,  digits) 

+  if else (x  >  le~04,  paste  ("pv  =  ",  base  *  ( 10 ^ exponent ) ,  sep  =  ""), 

+  paste  ("pv  =  ",  base,  "E" ,  exponent,  sep  =  "")) 

+  } 

Generate  heatmap  for  MDACC  PCR  data.  If  the  PCR  value  is  0,  then 
replace  it  with  the  minimum  non-zero  value.  Take  log2  transfromation  of  PCR 
values. 

>  mda  <-  read. csv ("MDA_ data. csv" ,  row. names  =  1) 

>  mda. per  <-  mda[,  -(1:4)] 

>  mda.  per  [mda.  per  ==  0]  <-  min  (mda.  per  [mda.  per  /=  0]  ) 

>  mda[,  -(1:4)]  <-  mda. per  <-  log2(mda.pcr) 

>  rgb .palette  <-  colorRampPalette (c( "green" ,  "black",  "red"),  space  =  "rgb") 

>  heatmap (t (mda. per) ,  scale  =  "none",  col  =  rgb. palette ( 13) ,  margins  =  c(4, 

+  4),  cex.axis  =  1) 
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Figure  la  heatmap  for  MDACC  PCR  data 
Characterize  MDACC  patients  using  unsupervised  clustering  algorithm. 

>  cluster  <-  cutree (hclust (dist (mda.pcr) ) ,  k  =  3) 

>  mda.clust  <-  data. frame (cluster ,  mda[}  1 : 4]) [cluster  !=  3,  ] 

Cluster  3  includes  only  one  patient  and  is  regarded  as  an  outlier  based  on  NR 
expression  profile.  Then  we  looked  at  that  patient’s  clinical  data  and  found  the 
patient  has  very  short  survival  time. 

>  sf  <-  survf it (Surv (Survival _Time ,  Dead)  ~  cluster ,  data  =  mda.clust) 

>  logrank  <-  survdiff (Surv(Survival_Time ,  Dead)  ~  cluster ,  data  =  mda.clust) 

>  logrank 

Call: 

survdiff (formula  =  Surv(Survival_Time ,  Dead)  ~  cluster,  data  =  mda.clust) 

N  Observed  Expected  (0-E)~2/E  (0-E)~2/V 

cluster=l  16  4  10.8  4.28  16.5 

cluster=2  13  11  4.2  11.02  16.5 

Chisq=  16.5  on  1  degrees  of  freedom,  p=  4.84e-05 
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>  pv  <-  pchisq(logrank$chisq,  1,  lower  .tail  =  F) 

>  summary (coxph( Surv( Survival _Time ,  Dead)  ~  cluster ,  data  =  mda.  dust) ) 
Call: 

coxph(f ormula  =  Surv(Survival_Time ,  Dead)  ~  cluster,  data  =  mda.clust) 
n=  29 

coef  exp(coef)  se(coef)  z  p 

cluster  2.16  8.7  0.613  3.53  0.00042 

exp(coef)  exp(-coef)  lower  .95  upper  .95 
cluster  8.7  0.115  2.62  28.9 

Rsquare=  0.395  (max  possible=  0.953  ) 

Likelihood  ratio  test=  14.6  on  1  df ,  p=0. 000134 

Wald  test  =12.4  on  1  df ,  p=0. 000417 

Score  (logrank)  test  =16.5  on  1  df ,  p=4.84e-05 

>  plot(sf,  main  =  xlab  =  " Survival  Time  (month)",  ylab  =  "Survival" , 
+  cex.lab  =  1.5 ,  mark  =  c(lf  19),  cex  =  1.2) 

>  text (60,  0.5,  pv. expr(pv) ,  cex  =  1.5) 
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>  sf  <-  survfit (Surv(TOE,  Progression)  ~  cluster ,  data  =  mda. dust) 

>  logrank  <-  survdiff (Surv(TOE,  Progression)  ~  cluster ,  data  =  mda. dust) 

>  logrank 

Call: 

survdiff (formula  =  Surv(T0E,  Progression)  ~  cluster,  data  =  mda.clust) 

n=28,  1  observation  deleted  due  to  missingness. 

N  Observed  Expected  (0-E)~2/E  (0-E)~2/V 

cluster=l  16  10  15.83  2.15  9.76 

cluster=2  12  11  5.17  6.57  9.76 

Chisq=  9.8  on  1  degrees  of  freedom,  p=  0.00178 

>  pv  <-  pchisq(logrank$chisq}  1 ,  lower .tail  =  F) 

>  summary (coxph(Surv (TOE,  Progression)  ~  cluster,  data  =  mda. dust) ) 

Call: 

coxph(f ormula  =  Surv(T0E,  Progression)  ~  cluster,  data  =  mda.clust) 

n=28  (1  observation  deleted  due  to  missingness) 
coef  exp(coef)  se(coef)  z  p 

cluster  1.46  4.31  0.502  2.91  0.0036 

exp(coef)  exp(-coef)  lower  .95  upper  .95 
cluster  4.31  0.232  1.61  11.5 

Rsquare=  0.267  (max  possible=  0.984  ) 

Likelihood  ratio  test=  8.69  on  1  df , 

Wald  test  =8.46  on  1  df , 

Score  (logrank)  test  =9.76  on  1  df , 

>  { 

+  plot(sf ,  main  =  xlab  =  " Time  to  Progression  (month)", 

+  ylab  =  " Progression  free  survival" ,  cex.lab  =  1.5,  mark  =  c(l, 

+  19) ,  cex  =1.2) 

+  text (50,  0.9,  pv. expr(pv) ,  cex  =  1.5) 

+  } 


p=0. 00319 
p=0. 00362 
p=0. 00178 
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Figure  lc.  KM  plot  for  progression  free  survival 

Predicting  MDACC  patients’  survival  using  recursive-partitioning  tree  anal¬ 
ysis  (RPART)  and  then  use  leave  one  out  cross  validation  (LOOCV)to  check 
the  performance. 

>  mda.surv  <-  mda[,  -(3:4)] 

>  fit  <-  rpart (Surv( Survival _Time ,  Dead)  ~  .  ,  data  =  mda.surv) 

>  print (fit) 

n=  30 

node),  split,  n,  deviance,  yval 
*  denotes  terminal  node 

1)  root  30  41.707190  1.0000000 

2)  PR>=-4. 899576  17  11.093620  0.3051777  * 

3)  PR<  -4.899576  13  8.068598  2.8613000  * 

>  res  <-  rep(0,  30) 

>  for  (i  in  1:30)  { 

+  fit  <-  rpart (Surv (Survival _Time }  Dead)  ~  .,  data  =  mda. surv[-i , 

+  J) 

+  res[i]  <-  (predict (fit ,  newdat  =  mda. surv [i ,  ])  >  1) 
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+  } 

>  sf  <-  survf it (Surv( Survival _Time ,  Dead) 

>  summary  (coxph( Surv (Survival _Time ,  Dead) 


res,  data  =  mda. surv) 
res,  data  =  mda. surv) ) 


Call: 

coxph (formula  =  Surv ( Surv ival_Tiine ,  Dead)  ~  res,  data  =  mda.  surv) 


n=  30 

coef  exp(coef)  se(coef)  z  p 

res  1.95  7.03  0.589  3.31  0.00092 


exp(coef)  exp(-coef)  lower  .95  upper  .95 
res  7.03  0.142  2.22  22.3 


Rsquare=  0.355  (max  possible=  0.958  ) 

Likelihood  ratio  test=  13.2  on  1  df ,  p=0. 000285 

Wald  test  =11.0  on  1  df ,  p=0. 000925 

Score  (logrank)  test  =14.3  on  1  df ,  p=0. 000152 

>  logrank  <-  survdiff (Surv ( Surv ival_Time ,  Dead)  ~  res,  data  =  mda. surv) 

>  logrank 

Call: 

survdiff (formula  =  Surv(Survival_Time ,  Dead)  ~  res,  data  =  mda. surv) 

N  Observed  Expected  (0-E)~2/E  (0-E)~2/V 

res=0  16  4  10.91  4.38  14.3 

res=l  14  12  5.09  9.37  14.3 

Chisq=  14.3  on  1  degrees  of  freedom,  p=  0.000152 

>  pv  <-  pchisq(logrank$chisq,  1,  lower .tail  =  F) 

Use  Kaplan-Meier  plot  to  show  the  predictive  power  of  NR  gene  signature 
in  MDACC  LOOCC  analysis. 

>  plot(sf,  conf.int  =  F,  main  =  "MDACC  L00CV" ,  xlab  =  "Survival  Time  (month)", 
+  ylab  =  "Survival" ,  cex.lab  =  1.2,  mark  =  c(l,  19),  cex  =  1.5) 

>  text  (50,  0.6,  pv.  expr(pv)  ,  cex  =  1.5) 
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MDACC  LOOCV 


Figure  2a.  LOOCV  of  the  recursive-partitioning  tree  model  for  the  30-sample 

MDACC  QPCR  data  set. 

Read  Consortium  data 

>  Consortium  <-  read. csv(" Consort ium_data. csv" ,  row. names  =  1) 

Divide  Consortium  data  into  traning  and  testing  sets,  using  the  same  ar¬ 
rangement  as  Shedden  et  al.  paper 

>  dat. train  <-  Consortium  [Consort ium$TESTTYPE  ==  " Train ",  c(l, 

+  2,  10:57)] 

>  dat. test  <-  Consortium [Cons ortium$TESTTYPE  ==  "Test",  c(l,  2, 

+  10:57)] 

>  fit  <-  rp art (Surv (month ,  death)  ~  .,  data  =  dat. train) 

>  print (fit) 

n=254  (1  observation  deleted  due  to  missingness) 

node),  split,  n,  deviance,  yval 
*  denotes  terminal  node 

1)  root  254  383.721300  1.0000000 

2)  SF . 1>=5 . 035  238  355.235900  0.9321384 
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4)  COUP. TFb>=6. 17875  202  291.440300  0.8316414 

8)  PPARd<  6.396667  190  264.976500  0.7728499 

16)  COUP . TFb<  7.47875  178  244.205400  0.7107610 

32)  PPARd<  5.698333  7  1.733478  0.1332612  * 

33)  PPARd>=5 . 698333  171  234.749600  0.7479395 

66)  DAX.1<  4.6775  62  71.571530  0.4847427 

132)  TRb<  6.335  9  1.758713  0.1206434  * 

133)  TRb>=6 . 335  53  64.098200  0.5645271 

266)  ERRa<  7.2075  46  48.769070  0.4704948 

532)  COUP. TFg>=6. 725  13  11.104640  0.1410123  * 

533)  COUP.TFg<  6.725  33  29.811300  0.6617683 

1066)  RXRg>=5 . 255  10  5.824201  0.2138250  * 

1067)  RXRg<  5.255  23  17.427060  0.9020286  * 

267)  ERRa>=7 . 2075  7  10.083650  1.4280350  * 

67)  DAX. 1>=4. 6775  109  154.239700  0.9353668 

134)  N0R1<  5.808333  41  64.483780  0.5934419 

268)  NURR1<  5.873333  15  11.672550  0.2041338  * 

269)  NURR1 >=5.873333  26  43.908780  0.8997544 

538)  PR>=4.275  17  26.493760  0.5361375  * 

539)  PR<  4.275  9  6.248823  2.3703860  * 

135)  N0R1>=5. 808333  68  81.456440  1.2175480 

270)  MR>=5 . 945  58  68.849110  1.0750540 

540)  ERa>=5. 621667  7  4.285743  0.3296640  * 

541)  ERa<  5.621667  51  58.806370  1.2095250 

1082)  PNR<  4.7225  16  26.455710  0.6444469  * 

1083)  PNR>=4.7225  35  26.466970  1.5237510 

2166)  COUP . TFb<  6.83625  26  12.252950  1.2556770  * 

2167)  COUP. TFb>=6. 83625  9  8.916298  2.8217610  * 

271)  MR<  5.945  10  6.484173  2.5454740  * 

17)  COUP. TFb>=7. 47875  12  9.302349  2.1612190  * 

9)  PPARd>=6 . 396667  12  13.595450  2.5960540  * 

5)  COUP . TFb<  6.17875  36  52.937150  1.6650440 

10)  ERRa<  7.0875  23  32.112400  1.2075800 

20)  C0UP.TFg>=6.63  12  12.389740  0.7048566  * 

21)  COUP.TFg<  6.63  11  12.325520  2.2287740  * 

11)  ERRa>=7 . 0875  13  12.719170  3.1216850  * 

3)  SF.1<  5.035  16  13.967780  2.6992260  * 

>  group  <-  ifelse (predict (fit,  newd at  =  dat . test)  >  1,  "High", 

+  "  Low") 

>  sf  <-  survf it (Surv (month,  death)  ~  group,  data  =  dat. test) 

>  summary (coxph (Surv (month,  death)  ~  group,  data  =  dat .test) ) 

Call: 

coxph(f ormula  =  Surv (month ,  death)  ~  group,  data  =  dat. test) 


11=186  (1  observation  deleted  due  to  missingness) 
coef  exp(coef)  se(coef)  z  p 
groupHigh  0.712  2.04  0.306  2.33  0.02 

exp(coef)  exp(-coef)  lower  .95  upper  .95 
groupHigh  2.04  0.49  1.12  3.71 

Rsquare=  0.033  (max  possible=  0.975  ) 

Likelihood  ratio  test=  6.25  on  1  df ,  p=0.0124 

Wald  test  =5.42  on  1  df ,  p=0.0200 

Score  (logrank)  test  =5.65  on  1  df ,  p=0.0175 

>  logrank  <-  survdif f (Surv (month,  death)  ~  group,  data  =  dat .test) 

>  logrank 

Call: 

survdif f (formula  =  Surv (month,  death)  ~  group,  data  =  dat. test) 

n=186,  1  observation  deleted  due  to  missingness. 

N  Observed  Expected  (0-E)~2/E  (0-E)~2/V 

group=  Low  51  13  22.3  3.89  5.64 

group=High  135  61  51.7  1.68  5.64 

Chisq=  5.6  on  1  degrees  of  freedom,  p=  0.0175 

>  pv  <-  pchisq(logrank$chisq,  1,  lower .tail  =  F) 

Figure2B.  Independent  validation  of  the  NR  gene  signature  in  the  442-sample 
cohort  multi-institute  consortium  using  RPART  analysis.  The  microarray  data 
sets  were  divided  into  two  groups,  one  for  the  training  and  the  other  for  the 
testing  cohort. 

>  plot(sf,  conf.int  =  F,  main  =  "Consortium  Train  to  Test  without  clinical  variable" , 

+  xlab  =  "Time  to  Dead  (Month)",  ylab  =  "Survival" ,  cex.lab  =  1.2, 

+  mark  =  c(l,  19),  cex  =  1.5,  lty  =  1,  lwd  =  2) 

>  text (100 ,  0.9,  pv . expr(pv) ,  cex  =1.5) 
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Consortium  Train  to  Test  without  clinical  variable 


Figure  2b.  Independent  validation  of  the  NR  gene  signature  in  the  442-sample 
cohort  multi-institute  consortium  using  RPART  analysis. 

Normalize  and  combind  MDACC  data  and  Consortium  data 

>  Consortium. expr  <-  Consortium [ ,  10:57] 

>  Consortium. expr  <-  scale (Consortium. expr) 

>  mda.surv  <-  mda[,  -(3:4) ] 

>  mda.surv[,  -(1:2)]  <-  scale  (mda.  surv [,  -(1:2)]) 

>  common. gene  <-  intersect (colnames (mda. surv) [- (1 : 2)] ,  colnames (Consortium . expr) ) 

>  mda. data  <-  data. frame (type  =  "mda" ,  Stage  =  NA,  mda. surv [,  1:2], 

+  mda. surv [,  common . gene] ) 

>  Consortium. data  <-  data. frame  (type  =  "Consortium" ,  Stage  =  Consort ium$st age , 

+  Dead  =  Consort ium$death,  Survival_Time  =  Consort ium$month, 

+  Consortium. expr [,  common. gene] ) 

>  combined  <-  data. frame (rbind (mda. data,  Consortium. data) ) 

Use  MDACC  as  training  data  and  Consortium  as  testing  data 

>  data. train  <-  combined [combined$type  ==  "mda",  ] 

>  data. test  <-  combined  [combined$type  ==  "Consortium" ,  ] 

>  fit  <-  rpart (Surv (Survival _Time ,  Dead)  ~  .,  data  =  data. train) 

>  print (fit) 
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n=  30 


node),  split,  n,  deviance,  yval 
*  denotes  terminal  node 

1)  root  30  41.707190  1.0000000 

2)  PR>=0. 04657526  17  11.093620  0.3051777  * 

3)  PR<  0.04657526  13  8.068598  2.8613000  * 

>  group  <-  if else (predict (fit ,  newdat  =  data. test)  >  1,  " High ", 

+  "  Low") 

>  sf  <-  survf it (Surv( Survival _Time ,  Dead)  ~  group ,  data  =  data. test) 

>  summary (coxph( Surv( Survival _Time ,  Dead)  ~  group ,  data  =  data. test) ) 

Call: 

coxph(f ormula  =  Surv(Survival_Time ,  Dead)  ~  group,  data  =  data. test) 

n=440  (2  observations  deleted  due  to  missingness) 
coef  exp(coef)  se(coef)  z  p 

groupHigh  0.377  1.46  0.135  2.8  0.0051 

exp(coef)  exp(-coef)  lower  .95  upper  .95 
groupHigh  1.46  0.686  1.12  1.9 

Rsquare=  0.018  (max  possible=  0.997  ) 

Likelihood  ratio  test=  8.03  on  1  df ,  p=0. 00460 

Wald  test  =7.85  on  1  df ,  p=0. 00509 

Score  (logrank)  test  =7.94  on  1  df ,  p=0. 00484 

>  logrank  <-  survdif f (Surv (Survival _Time ,  Dead)  ~  group ,  data  =  data. test) 

>  logrank 

Call: 

survdif f (formula  =  Surv(Survival_Time ,  Dead)  ~  group,  data  =  data. test) 

n=440,  2  observations  deleted  due  to  missingness. 

N  Observed  Expected  (0-E)~2/E  (0-E)~2/V 

group=  Low  209  91  112  4.11  7.97 

group=High  231  145  124  3.74  7.97 

Chisq=  8  on  1  degrees  of  freedom,  p=  0.00476 

>  pv  <-  pchisq(logrank$chisq,  1,  lower .tail  =  F) 

Figure2C.  MDACC  to  consortium.  The  MDACC  cohort  (n=30)  was  used 
as  training  set  and  the  predictive  model  was  tested  in  the  consortium  cohort 
(n=442) 
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>  plot(sf,  conf.int  =  F,  main  =  "MDACC  to  consortium" ,  xlab  =  "Time  to  Dead  (Month)", 
+  ylab  =  "Survival" ,  cex.lab  =  1.2 ,  mark  =  c(l,  19),  cex  =  1.5, 

+  lwd  =  2) 

>  text  (140,  0.9,  pv.  expr(pv) ,  cex  =  1.5) 


MDACC  to  consortium 


Figure  2c.  MDACC  to  consortium 

Use  consortium  as  the  training  data  and  MDACC  as  the  testing  data 

>  data. test  <-  combined  [combined$type  ==  "mda" ,  -2] 

>  data. train  <-  combined [combined$type  ==  "Consortium" ,  -2] 

>  fit  <-  rpart (Surv (Survival _Time ,  Dead)  ~  .,  data  =  data. train) 

>  print (fit) 

n=440  (2  observations  deleted  due  to  missingness) 

node),  split,  n,  deviance,  yval 
*  denotes  terminal  node 

1)  root  440  650.960700  1.0000000 

2)  SF.1>=-1. 600797  422  612.349500  0.9473605 
4)  PPARd<  1.546829  393  557.598200  0.8889341 
8)  R0Ra>=-l. 110901  353  488.590600  0.8127724 
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16)  RARa>=-0 . 8263496  282  357.962100  0.7005085 

32)  RARg<  0.7062089  206  236.112100  0.5957128 

64)  NURR1<  -1.126548  22  7.912262  0.1177802  * 

65)  NURR1>=-1 . 126548  184  215.220500  0.6709268 

130)  PXR>=1. 269161  9  1.729126  0.1354368  * 

131)  PXR<  1.269161  175  206.439800  0.7109939 

262)  TR2>=1. 257944  11  4.844043  0.1918291  * 

263)  TR2<  1.257944  164  194.013000  0.7694938 

526)  LXRa>=-0 . 8594324  134  144.483400  0.6588147  * 

527)  LXRa<  -0.8594324  30  42.461040  1.3332290 

1054)  ERa>=0. 1125855  13  13.110410  0.6988573  * 

1055)  ERa<  0.1125855  17  21.932320  2.1376250  * 

33)  RARg>=0 . 7062089  76  113.805200  1.0449810 

66)  GR>=0. 1164013  21  28.988230  0.4769354 

132)  NGFIB3>=0. 2269965  9  1.775888  0.1120559  * 

133)  NGFIB3<  0.2269965  12  18.553220  1.0365880  * 

67)  GR<  0.1164013  55  76.323680  1.3577180 

134)  ERa>=l. 44486  7  3.714183  0.2946938  * 

135)  ERa<  1.44486  48  61.587110  1.6885060 

270)  FXR>=1 . 1885  7  7.610207  0.3961379  * 

271)  FXR<  1.1885  41  44.954860  2.0384140  * 

17)  RARa<  -0.8263496  71  118.121600  1.2993340 

34)  SHP>=0. 9231351  12  15.193210  0.4609037  * 

35)  SHP<  0.9231351  59  95.018530  1.5203210 

70)  DAX.1<  -0.4961372  17  22.775660  0.6645850  * 

71)  DAX.l>=-0. 4961372  42  59.723230  2.1019450 

142)  LXRb>=-0 . 3823468  15  17.205720  1.1617770  * 

143)  LXRb<  -0.3823468  27  32.151590  3.1673350  * 

9)  R0Ra<  -1.110901  40  54.464440  1.8193850 

18)  AR>=-0. 4973712  24  24.203220  1.1693540  * 

19)  AR<  -0.4973712  16  18.400040  3.6731330  * 

5)  PPARd>= 1.546829  29  40.620470  2.1935050 

10)  PPARg<  0.3242693  19  22.356910  1.4967360  * 

11)  PPARg>=0 . 3242693  10  11.316950  3.9286630  * 

3)  SF.1<  -1.600797  18  19.521260  3.1164310  * 


>  group  <-  if else (predict (fit ,  newdat  =  data. test)  >  1,  "High", 

+  "  Low") 

>  table (group) 


group 
Low  High 
21  9 

>  sf  <-  survfit (Surv(Survival_Time ,  Dead)  ~  group,  data  =  data. test) 

>  summary (coxph(Surv(Survival_Time,  Dead)  ~  group,  data  =  data. test)) 
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Call: 

coxph(f ormula  =  Surv(Survival_Time ,  Dead)  ~  group,  data  =  data. test) 
n=  30 

coef  exp(coef)  se(coef)  z  p 
groupHigh  1.1  3.01  0.55  2.00  0.045 

exp(coef)  exp(-coef)  lower  .95  upper  .95 
groupHigh  3.01  0.333  1.02  8.83 

Rsquare=  0.115  (max  possible=  0.958  ) 

Likelihood  ratio  test=  3.66  on  1  df ,  p=0.0557 

Wald  test  =4.01  on  1  df ,  p=0.0452 

Score  (logrank)  test  =4.38  on  1  df ,  p=0.0363 

>  logrank  <-  survdiff (Surv (Survival- Time,  Dead)  ~  group ,  data  =  data. test) 

>  logrank 

Call: 

survdiff (formula  =  Surv(Survival_Time ,  Dead)  ~  group,  data  =  data. test) 

N  Observed  Expected  (0-E)~2/E  (0-E)~2/V 

group=  Low  21  10  13.12  0.742  4.38 

group=High  9  6  2.88  3.381  4.38 

Chisq=  4.4  on  1  degrees  of  freedom,  p=  0.0363 

>  pv  <-  pchisq(logrank$chisq,  1,  lower .tail  =  F) 

Figure2D  Cross-validation  of  the  NR  gene-expression  signature.  The  con¬ 
sortium  cohort  (n=442)  training  set  was  tested  in  the  MDACC  cohort  (n=30) 

>  plot(sf,  conf.int  =  F,  main  =  "Consortium  to  MDACC ",  xlab  =  "Time  to  Dead  (Month)", 
+  ylab  =  "Survival" ,  cex.lab  =  1.2,  mark  =  c(l,  19),  cex  =  1.5, 

+  lwd  =  2) 

>  text  (20,  0.2,  pv.  expr(pv) ,  cex  =  1.5) 
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Consortium  to  MDACC 


Figure  2d.  Consortium  to  MDACC 

Figure  3 A  is  same  as  Figure  2 A 
Figure  3B  is  same  as  Figure  2C 

>  ind.PR  <-  which (colnames(mda. surv)  ==  "PR") 

>  fit  <-  rpart (Surv (Survival _Time }  Dead)  ~  .  ,  data  =  mda. surv[, 

+  -ind . PR] ) 

>  print  (fit) 

n=  30 

node),  split,  n,  deviance,  yval 
*  denotes  terminal  node 

1)  root  30  41.707190  1.0000000 

2)  SHP>=0. 4814448  13  5.878756  0.1838269  * 

3)  SHP<  0.4814448  17  14.064460  2.2471280  * 

>  res  <-  rep(0,  30) 

>  for  (i  in  1:30)  { 

+  fit  <-  rpart (Surv (Survival _Time ,  Dead)  ~  .,  data  =  mda. surv [-i , 
+  -ind . PR] ) 

+  res [i]  <-  (predict (fit ,  newdat  =  mda. surv [i ,  ])  >  1) 
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+  } 

>  summary (coxph( Surv( Survival _Time ,  Dead)  ~  res,  data  =  mda.surv)) 

Call: 

coxph (formula  =  Surv(Survival_Time ,  Dead)  ~  res,  data  =  mda.surv) 
n=  30 

coef  exp(coef)  se(coef)  z  p 

res  2.61  13.6  0.769  3.39  0.00069 

exp(coef)  exp(-coef)  lower  .95  upper  .95 
res  13.6  0.0735  3.01  61.4 

Rsquare=  0.461  (max  possible=  0.958  ) 

Likelihood  ratio  test=  18.6  on  1  df ,  p=1.65e-05 

Wald  test  =11.5  on  1  df ,  p=0. 00069 

Score  (logrank)  test  =18.3  on  1  df ,  p=1.91e-05 

>  sf  <-  survf it (Surv( Survival _Time ,  Dead)  ~  res ,  data  =  mda.surv) 

>  logrank  <-  survdiff (Surv(Survival_Time ,  Dead)  ~  res,  data  =  mda.surv) 

>  logrank 

Call: 

survdiff (formula  =  Surv(Survival_Time ,  Dead)  ~  res,  data  =  mda.surv) 

N  Observed  Expected  (0-E)~2/E  (0-E)~2/V 

res=0  14  2  10.04  6.44  18.3 

res=l  16  14  5.96  10.85  18.3 

Chisq=  18.3  on  1  degrees  of  freedom,  p=  1.91e-05 

>  pv  <-  pchisq(logrank$chisq,  1,  lower .tail  =  F) 

>  plot(sf,  conf.int  =  F,  main  =  " MDACC  L00CV  SHP",  xlab  =  "Survival  Time  ( month )", 
+  ylab  =  "Survival" ,  cex.lab  =  1.2,  mark  =  c(l,  19),  cex  =  1.5) 

>  text  (60,  0.6,  pv.  expr(pv)  ,  cex  =  1.5) 
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MDACC  LOOCV  SHP 


Figure  3d.  The  MDACC  LOOCV  without  PR  gene 

>  data. train  <-  combined [combined$type  ==  "mda" ,  colnames (combined)  ! 

+  "PR"] 

>  data. test  <-  combined  [combined$type  ==  "Consortium" ,  ] 

>  fit  <-  rpart (Surv (Survival _Time }  Dead)  ~  .  ,  data  =  data. train) 

>  print (fit) 

n=  30 

node),  split,  n,  deviance,  yval 
*  denotes  terminal  node 

1)  root  30  41.707190  1.0000000 

2)  SHP>=0. 4814448  13  5.878756  0.1838269  * 

3)  SHP<  0.4814448  17  14.064460  2.2471280  * 

>  group  <-  if else (predict (fit ,  newdat  =  data. test)  >  1,  " High ", 

+  "  Low") 

>  sf  <-  survf it (Surv (Survival _Time }  Dead)  ~  group ,  data  =  data. test) 

>  summary (coxph( Surv (Survival _Time }  Dead)  ~  group ,  data  =  data. test) ) 

Call: 

coxph(f ormula  =  Surv ( Surv ival_Time ,  Dead)  ~  group,  data  =  data. test) 
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11=440  (2  observations  deleted  due  to  missingness) 
coef  exp(coef)  se(coef)  z  p 

groupHigh  0.478  1.61  0.181  2.63  0.0084 

exp(coef)  exp(-coef)  lower  .95  upper  .95 
groupHigh  1.61  0.62  1.13  2.3 

Rsquare=  0.017  (max  possible=  0.997  ) 

Likelihood  ratio  test=  7.73  on  1  df ,  p=0. 00544 

Wald  test  =6.94  on  1  df ,  p=0. 00843 

Score  (logrank)  test  =7.07  on  1  df ,  p=0. 00782 

>  logrank  <-  survdiff (Surv(Survival_Time ,  Dead)  ~  group,  data  =  data. test) 

>  logrank 

Call: 

survdiff (formula  =  Surv(Survival_Time ,  Dead)  ~  group,  data  =  data. test) 

n=440,  2  observations  deleted  due  to  missingness. 

N  Observed  Expected  (0-E)~2/E  (0-E)~2/V 

group=  Low  92  36  53  5.46  7.08 

group=High  348  200  183  1.58  7.08 

Chisq=  7.1  on  1  degrees  of  freedom,  p=  0.00779 

>  pv  <-  pchisq(logrank$chisq,  1,  lower .tail  =  F) 

>  { 

+  plot(sf,  conf.int  =  F,  main  =  "MDACC  to  consortium ,  SHP" , 

+  xlab  =  "Time  to  Dead  (Month)",  ylab  =  "Survival" ,  cex.lab  =  1.2, 

+  mark  =  c(l,  19),  cex  =  1.5,  lwd  =  2) 

+  text  (140 ,  0.9,  pv .  expr(pv)  ,  cex  =  1.5) 

+  } 
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MDACC  to  consortium,  SHP 


Figure  3e.  MDACC  to  consortium  without  PR  gene 

Kaplan-Meier  survival  plots  showing  the  performance  of  using  single  NR 
gene  predictors,  SHP  and  PR  seperatly,  in  stage  I  lung  cancer  patients. 

Figure  4A 

predicting  stage  I  lung  cancer  patients  survival  using  SHP  alone 

>  data,  test  <-  combined  [combined$ type  ==  "Consortium"  Sc  comb ined$St age 

+  1 ,  colnames  (combined)  !  =  "PR" ] 

>  group  <-  if else (predict (fit ,  newdat  =  data. test)  >  1,  "High", 

+  "  Low") 

>  sf  <-  survfit (Surv (Survival_ Time,  Dead)  ~  group,  data  =  data. test) 

>  summary (coxph (Surv (Survival _Time ,  Dead)  ~  group,  data  =  data. test) ) 

Call: 

coxph(f ormula  =  Surv ( Surv ival_Time ,  Dead)  ~  group,  data  =  data. test) 
n=  275 

coef  exp(coef)  se(coef)  z  p 
groupHigh  0.557  1.74  0.264  2.11  0.035 

exp(coef)  exp(-coef)  lower  .95  upper  .95 
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groupHigh  1.74  0.573  1.04  2.93 

Rsquare=  0.018  (max  possible=  0.982  ) 

Likelihood  ratio  test=  5.03  on  1  df ,  p=0.0249 

Wald  test  =4.44  on  1  df ,  p=0.0350 

Score  (logrank)  test  =4.56  on  1  df ,  p=0.0327 

>  logrank  <-  survdiff (Surv(Survival_Time ,  Dead)  ~  group,  data  =  data. test) 

>  logrank 

Call: 

survdiff (formula  =  Surv(Survival_Time ,  Dead)  ~  group,  data  =  data. test) 

N  Observed  Expected  (0-E)~2/E  (0-E)~2/V 

group=  Low  64  17  26.6  3.44  4.55 

group=High  211  94  84.4  1.08  4.55 

Chisq=  4.6  on  1  degrees  of  freedom,  p=  0.0328 

>  pv  <-  pchisq(logrank$chisq,  1,  lower .  tail  =  F) 

>  { 

+  plot(sf,  conf.int  =  F,  main  =  "MDACC  to  consortium,  SHP,  stagel  only", 

+  xlab  =  " Time  to  Dead  (Month)",  ylab  =  "Survival" ,  cex.lab  =  1.2, 

+  mark  =  c(l,  19),  cex  =  1.5,  lwd  =  2) 

+  text  (140,  0.9,  pv .  expr(pv)  ,  cex  =  1.5) 

+  } 
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MDACC  to  consortium,  SHP,  stagel  only 


Figure  4a.  predicting  stage  I  lung  cancer  patients  survival  using  SHP  alone 
predicting  stage  I  lung  cancer  patients  survival  using  PR  alone 

>  data. train  <-  combined [combined$type  ==  "mda" ,  ] 

>  fit  <-  rpart (Surv (Survival _Time }  Dead)  ~  .  ,  data  =  data. train) 

>  print (fit) 

n=  30 


node),  split,  n,  deviance,  yval 
*  denotes  terminal  node 


1)  root  30  41.707190  1.0000000 

2)  PR>=0. 04657526  17  11.093620  0.3051777  * 

3)  PR<  0.04657526  13  8.068598  2.8613000  * 


>  data,  test  <-  combined [combined$type  ==  " Consortium "  Sc 

+  1,  J 

>  group  <-  if else (predict (fit ,  newdat  =  data. test)  >  1, 

+  "  Low") 

>  sf  <-  survf it (Surv (Survival _Time }  Dead)  ~  group ,  data 

>  summary (coxph( Surv (Survival _Time }  Dead)  ~  group ,  data 


combined$Stage 

" High ", 

=  data. test) 

=  data. test) ) 
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Call: 

coxph(f ormula  =  Surv(Survival_Time ,  Dead)  ~  group,  data  =  data. test) 
n=  275 

coef  exp(coef)  se(coef)  z  p 
groupHigh  0.355  1.43  0.197  1.80  0.071 

exp(coef)  exp(-coef)  lower  .95  upper  .95 
groupHigh  1.43  0.701  0.97  2.10 

Rsquare=  0.012  (max  possible=  0.982  ) 

Likelihood  ratio  test=  3.32  on  1  df ,  p=0.0683 

Wald  test  =3.26  on  1  df ,  p=0.0711 

Score  (logrank)  test  =3.29  on  1  df ,  p=0.0697 

>  logrank  <-  survdiff (Surv (Survival- Time,  Dead)  ~  group ,  data  =  data. test) 

>  logrank 

Call: 

survdiff (formula  =  Surv(Survival_Time ,  Dead)  ~  group,  data  =  data. test) 

N  Observed  Expected  (0-E)~2/E  (0-E)~2/V 

group=  Low  137  44  53.5  1.67  3.3 

group=High  138  67  57.5  1.55  3.3 

Chisq=  3.3  on  1  degrees  of  freedom,  p=  0.0692 

>  pv  <-  pchisq(logrank$chisq,  1,  lower  .tail  =  F) 

>  { 

+  plot(sf,  conf .  int  =  F,  main  =  "MDACC  to  consortium ,  stagel  only ", 

+  xlab  =  "Time  to  Dead  (Month)",  ylab  =  "Survival" ,  cex.lab  =  1.2, 

+  mark  =  c(l,  19),  cex  =  1.5,  lwd  =  2) 

+  text  (140 ,  0.9,  pv .  expr(pv)  ,  cex  =  1.5) 

+  } 
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MDACC  to  consortium,  stagel  only 
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Figure4b.  predicting  stage  I  lung  cancer  patients  survival  using  PR  alone, 
read  normal  tissue  data 

>  mda. normal  <-  read. csv("MDA_data_normal . csv" ,  row. names  =  1) 

>  dim  (mda. normal) 

[1]  30  54 

>  mda. normal [1:4,  1:16 J 


Dead 

Survival_Time  Progr 

ession 

TOE 

COUP . TFb 

TR4 

DAX.l 

737 

0 

84.819672 

0 

84.819672 

0.42587400 

0.4170522 

0 

739 

1 

63.901639 

1 

47.737705 

0.71241461 

0.7078501 

0 

749 

0 

6.491803 

0 

6.491803 

0.09914638 

0.5795992 

0 

756 

0 

72.950820 

0 

72.950820 

0.49208097 

0.6910128 

0 

LXRb  RARa 

RXRb 

REV.ERBa 

REV.ERBb  COUP . TFg 

RORa 

737  0.4245388  0.3603237  0.1484563  0.4978787  0.5986191  0.1063993  0.3778851 

739  0.6956200  0.6239872  0.1837411  0.5680166  0.8705468  0.1573902  0.4897771 

749  0.6110015  0.4097818  0.1818671  0.2851873  0.5119912  0.1322197  0.4820626 

756  1.1302375  0.4390864  0.4070447  0.3802573  0.7447870  0.1530503  0.6154929 

GR  PPARg 
737  0.4709204  0.1468401 

739  0.5554739  0.6925421 


0  50  100  150  200 

Time  to  Dead  (Month) 


23 


749  0.4594592  0.5910218 
756  0.7583135  1.1323468 

Identification  of  NR  expression  as  prognostic  biomarkers  in  normal  lung  tis¬ 
sues  from  lung  cancer  patients. 

>  fit  <-  rp art (Surv (TOE ,  Progression)  ~  .,  data  =  mda . normal  [ , 

+  -(1:2)]) 

>  print  (fit) 

n=29  (1  observation  deleted  due  to  missingness) 

node),  split,  n,  deviance,  yval 
*  denotes  terminal  node 

1)  root  29  39.18022  1.0000000 

2)  NGFIB3>=0. 008300668  13  14.53515  0.4504191  * 

3)  NGFIB3<  0.008300668  16  10.95998  2.0097130  * 

>  res  <-  rep(0,  30) 

>  for  (i  in  1:30)  { 

+  fit  <-  rp art (Surv (TOE ,  Progression)  ~  .,  data  =  mda. normal  [-i , 

+  ~(1:2)]) 

+  res  [i]  <-  (predict (fit ,  newdat  =  mda. normal  [i ,  -(1:2)])  > 

+  1) 

+  } 

>  sf  <-  surv f it (Surv (TOE }  Progression)  ~  res ,  data  =  mda. normal) 

>  logrank  <-  survdiff (Surv (TOE ,  Progression)  ~  res ,  data  =  mda. normal) 

>  logrank 

Call: 

survdiff (formula  =  Surv (TOE,  Progression)  ~  res,  data  =  mda. normal) 

n=29,  1  observation  deleted  due  to  missingness. 

N  Observed  Expected  (0-E)~2/E  (0-E)~2/V 

res=0  13  6  14.53  5.01  17.4 

res=l  16  16  7.47  9.74  17.4 

Chisq=  17.4  on  1  degrees  of  freedom,  p=  3.02e-05 

>  pv  <-  pchisq(logrank$chisq}  1 ,  lower . tail  =  F) 

>  summary (coxph (Surv (TOE,  Progression)  ~  res,  data  =  mda. normal) ) 

Call: 

coxph (formula  =  Surv (TOE,  Progression)  ~  res,  data  =  mda. normal) 
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11=29  (1  observation  deleted  due  to  missingness) 
coef  exp(coef)  se(coef)  z  p 

res  2.33  10.2  0.656  3.54  0.00039 

exp(coef)  exp(-coef)  lower  .95  upper  .95 
res  10.2  0.0976  2.83  37.1 

Rsquare=  0.462  (max  possible=  0.985  ) 

Likelihood  ratio  test=  18.0  on  1  df ,  p=2.25e-05 

Wald  test  =12.6  on  1  df ,  p=0. 000393 

Score  (logrank)  test  =17.4  on  1  df ,  p=3.02e-05 

>  plot(sf,  main  =  "MDACC  Normal  Tissue  L00CV" ,  xlab  =  "Time  to  Progression  (month)", 
+  ylab  =  "Progression  free  survival" ,  cex.lab  =  1.5 ,  mark  =  c(l, 

+  19),  cex  =1.2,  lwd  =  1) 

>  text  (60,  0.2,  pv.  expr(pv) ,  cex  =  1.5) 


MDACC  Normal  Tissue  LOOCV 


0  20  40  60  80 

Time  to  Progression  (month) 


Figure  S2a.  Kaplan-Meier  plots  of  time  to  progression,  LOOCV  of 
recursive-partitioning  tree  model  of  the  MDACC 

Kaplan-Meier  plots  of  overall  survival  time,  LOOCV  of  recursive-partitioning 
tree  model  of  the  MDACC 
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>  fit  <-  rpart (Surv (Survival _Time }  Dead)  ~  .  ,  data  =  mda . normal  [ , 

+  ~(3:4)]) 

>  print (fit) 

n=  30 

node),  split,  n,  deviance,  yval 
*  denotes  terminal  node 

1)  root  30  41.70719  1.0000000 

2)  MR>=0 . 04008524  18  14.96726  0.4520124  * 

3)  MR<  0.04008524  12  12.19802  2.5391310  * 

>  res  <-  rep(0,  30) 

>  for  (i  in  1:30)  { 

+  fit  <-  rpart (Surv (Survival _Time }  Dead)  ~  .,  data  =  mda. normal  [-i , 

+  ~(3:4)]) 

+  res[i]  <-  (predict (fit ,  newdat  =  mda. normal  [i ,  -(3:4)])  > 

+  V 

+  } 

>  sf  <-  survfit (Surv (Survival_ Time,  Dead)  ~  res ,  data  =  mda. normal) 

>  logrank  <-  survdiff (Surv (Surv ival_Time ,  Dead)  ~  res ,  data  =  mda. normal) 

>  logrank 

Call: 

survdiff (formula  =  Surv(Survival_Time ,  Dead)  ~  res,  data  =  mda. normal) 

N  Observed  Expected  (0-E)~2/E  (0-E)~2/V 


res=0  19 

8 

11.32 

0.976 

3.38 

res=l  11 

8 

4.68 

2.363 

3.38 

Chisq=  3.4 

on  1 

degrees  of 

freedom,  p= 

0.066 

>  pv  <-  pchisq(logrank$chisq}  1 ,  lower .tail  =  F) 

>  summary (coxph( Surv (Survival _Time }  Dead)  ~  res,  data  =  mda. normal) ) 
Call: 

coxph (formula  =  Surv(Survival_Time ,  Dead)  ~  res,  data  =  mda. normal) 
n=  30 

coef  exp(coef)  se(coef)  z  p 
res  0.897  2.45  0.504  1.78  0.075 

exp(coef)  exp(-coef)  lower  .95  upper  .95 
res  2.45  0.408  0.914  6.58 

Rsquare=  0.097  (max  possible=  0.958  ) 
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Likelihood  ratio  test=  3.07  on  1  df ,  p=0.0796 

Wald  test  =3.17  on  1  df ,  p=0.075 

Score  (logrank)  test  =3.38  on  1  df ,  p=0.066 

>  plot(sf,  main  =  "MDACC  Normal  Tissue  L00CV" ,  xlab  =  "Survival  Time  (month)", 
+  ylab  =  "Survival" ,  cex.lab  =1.5,  mark  =  c(l,  19),  cex  =1.2, 

+  lwd  =  1) 

>  text  (20,  0.2,  pv.  expr(pv) ,  cex  =  1.5) 

MDACC  Normal  Tissue  LOOCV 


FigureS2b  Kaplan-Meier  plots  of  overall  survival  time,  LOOCV  of 
recursive-partitioning  tree  model  of  the  MDACC 

Kaplan-Meier  estimates  of  survival  time  for  NR  expression  when  clinical 
variables  are  included  in  the  analysis.  The  microarray  data  set  from  four  the  4 
institute  consortium  were  divided  into  two  groups,  one  for  the  training  and  the 
other  for  the  testing  cohort. 

>  dat. train  <-  Consortium  [Consort ium$TESTTYPE  ==  "Train",  J 

>  dat. test  <-  Consortium [Cons ortium$TESTTYPE  ==  "Test",  J 

>  fit  <-  rp art (Surv (month ,  death)  ~  .,  data  =  dat .train) 

>  print (fit) 

n=254  (1  observation  deleted  due  to  missingness) 
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node),  split,  n,  deviance,  yval 
*  denotes  terminal  node 


1)  root  254  383.721300  1.00000000 

2)  stage<  1.5  156  202.737700  0.66829300 

4)  R0Rb<  4.665  120  155.062600  0.53791530 

8)  RXRg<  4.93  11  1.864147  0.06792644  * 

9)  RXRg>=4 . 93  109  139.269700  0.62690120 

18)  LXRb>=7 . 065  39  37.405480  0.32035740 

36)  REV . ERBa<  9.42  22  7.552179  0.08865569  * 

37)  REV . ERBa>=9 . 42  17  17.242840  0.78262790  * 

19)  LXRb<  7.065  70  91.576500  0.84658590 

38)  RARg<  7.225  48  57.767080  0.64300480 

76)  PNR>=4 . 6575  41  33.436310  0.51160520 

152)  DAX.1<  4.6175  7  1.691907  0.15404630  * 

153)  DAX. 1>=4.6175  34  27.498220  0.60038320 

306)  HNF4g>=4.08  23  18.245680  0.42965100 

612)  NGFIB3<  7.581667  10  4.337518  0.17020370  * 

613)  NGFIB3>=7. 581667  13  8.612122  0.71861550  * 

307)  HNF4g<  4.08  11  5.131342  1.06703600  * 

77)  PNR<  4.6575  7  13.171150  2.55292800  * 

39)  RARg>=7 . 225  22  26.131730  1.53438700  * 

5)  R0Rb>=4 . 665  36  36.582160  1.20639900 

10)  SHP>=7.47  7  7.459030  0.49655790  * 

11)  SHP<  7.47  29  22.895000  1.53094000 

22)  PNR>=4 . 8675  12  8.933589  0.95225090  * 

23)  PNR<  4.8675  17  9.592501  2.10976900  * 

3)  stage>=l . 5  98  137.290600  1.89650400 

6)  ERa>=5 . 505556  34  42.218900  1.16462800 

12)  RARg<  7.3275  22  27.016830  0.77837040 

24)  RARa>=6. 36125  11  10.078790  0.42791790  * 

25)  RARa<  6.36125  11  11.515790  1.36952300  * 

13)  RARg>=7 . 3275  12  5.878286  2.43517100  * 

7)  ERa<  5.505556  64  82.577530  2.56887000 

14)  LRH.1<  4.295  23  33.859350  1.49440000 

28)  PPARa<  4.495  15  15.370270  1.07328000  * 

29)  PPARa>=4.495  8  13.913090  2.59727300  * 

15)  LRH. 1>=4.295  41  37.126450  3.64341900 

30)  ERb<  5.58875  33  25.841990  3.12878300 

60)  ERRb<  5.38  19  12.265540  2.28985200  * 

61)  ERRb>=5 . 38  14  8.373323  4.61573700  * 

31)  ERb>=5. 58875  8  7.585137  5.27569700  * 

>  group  <-  ifelse (predict (fit,  newd at  =  dat.test)  >  1,  "High", 

+  "  Low") 

>  sf  <-  survf it (Surv (month,  death)  ~  group,  data  =  dat.test) 
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>  summary (coxph(Surv (month,  death)  ~  group,  data  =  dat .test) ) 


Call: 

coxph (formula  =  Surv (month,  death)  ~  group,  data  =  dat. test) 

n=186  (1  observation  deleted  due  to  missingness) 
coef  exp(coef)  se(coef)  z  p 
groupHigh  0.355  1.43  0.235  1.51  0.13 

exp(coef)  exp(-coef)  lower  .95  upper  .95 
groupHigh  1.43  0.701  0.9  2.26 

Rsquare=  0.012  (max  possible=  0.975  ) 

Likelihood  ratio  test=  2.29  on  1  df ,  p=0.13 

Wald  test  =2.29  on  1  df ,  p=0.131 

Score  (logrank)  test  =2.31  on  1  df ,  p=0.129 

>  logrank  <-  survdiff (Surv (month,  death)  ~  group,  data  =  dat. test) 

>  logrank 

Call: 

survdiff (formula  =  Surv (month,  death)  ~  group,  data  =  dat. test) 

n=186,  1  observation  deleted  due  to  missingness. 

N  Observed  Expected  (0-E)~2/E  (0-E)~2/V 

group=  Low  98  35  41.4  0.992  2.30 

group=High  88  39  32.6  1.261  2.30 

Chisq=  2.3  on  1  degrees  of  freedom,  p=  0.13 

>  pv  <-  pchisq(logrank$chisq,  1,  lower .tail  =  F) 

>  plot(sf ,  conf.int  =  F,  main  =  "Consortium  Train  to  Test  with  clinical  variable" , 

+  xlab  =  "Time  to  Dead  (Month)",  ylab  =  "Survival" ,  cex.lab  =1.2, 

+  mark  =  c(l,  19),  cex  =1.5,  lty  =  1,  lwd  =  2) 

>  text (100 ,  0.9,  pv . expr(pv) ,  cex  =1.5) 
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Consortium  Train  to  Test  with  clinical  variable 


Figure  S3.  Kaplan-Meier  estimates  of  survival  time  for  NR  expression  when 
clinical  variables  are  included  in  the  analysis. 

Consortium  unsupervised  clustering 

>  Consortium  <-  read. csv(" Consort ium_data. csv” ,  row. names  =  1) 

>  he  <-  hclust (dist (Consortium [ ,  10:57])) 

>  plot (he) 

>  cluster  <-  cutree (he f  k  =  2) 

>  sf  <-  survf it (Surv (month }  death)  ~  cluster ,  data  =  Consortium) 

>  summary (coxph (Surv (month,  death)  ~  cluster ,  data  =  Consortium) ) 

Call: 

coxph (formula  =  Surv (month,  death)  ~  cluster,  data  =  Consortium) 

n=440  (2  observations  deleted  due  to  missingness) 
coef  exp(coef)  se(coef)  z  p 
cluster  0.266  1.30  0.144  1.85  0.065 

exp(coef)  exp(-coef)  lower  .95  upper  .95 
cluster  1.30  0.766  0.984  1.73 

Rsquare=  0.007  (max  possible=  0.997  ) 

Likelihood  ratio  test=  3.29  on  1  df ,  p=0.0699 
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Wald  test  =3.42  on  1  df ,  p=0.0645 

Score  (logrank)  test  =3.44  on  1  df ,  p=0.0637 

>  logrank  <-  survdif f (Surv (month,  death)  ~  cluster,  data  =  Consortium) 

>  logrank 

Call: 

survdif f (formula  =  Surv(month,  death)  ~  cluster,  data  =  Consortium) 

n=440,  2  observations  deleted  due  to  missingness. 

N  Observed  Expected  (0-E)~2/E  (0-E)~2/V 

cluster=l  333  168  180.1  0.81  3.43 

cluster=2  107  68  55.9  2.61  3.43 

Chisq=  3.4  on  1  degrees  of  freedom,  p=  0.0638 

>  pv  <-  pchisq(logrank$chisq,  1,  lower .tail  =  F) 

>  plot(sf ,  conf.int  =  F,  main  =  "Consortium  unsupervised  clustering" , 

+  xlab  =  "Time  to  Dead  (Month)",  ylab  =  "Survival" ,  cex.lab  =1.2, 

+  mark  =  c(l,  19),  cex  =1.5,  lty  =  1,  lwd  =  2) 

>  text (100 ,  0.9,  pv . expr(pv) ,  cex  =1.5) 


Consortium  unsupervised  clustering 
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1.  ABSTRACT 

In  this  paper,  we  extend  the  semiparametric  approach  proposed  by  Kong  and  Lee  (2008,  Biometric)  to  the  case 
where  dose  effect  curves  follow  the  Emax  model  instead  of  the  median  effect  equation.  When  the  maximum  effects  for 
the  investigated  drugs  are  different,  we  give  a  procedure  to  obtain  the  additive  effect  based  on  the  Loewe  additivity 
model.  Then,  a  bivariate  thin  plate  spline  approach  is  applied  to  estimate  the  effect  beyond  additivity  along  with  its 
95%  point-wise  confidence  interval  as  well  as  its  95%  simultaneous  confidence  band  for  any  combination  dose.  Thus, 
synergy,  additivity,  and  antagonism  can  be  identified.  The  advantages  of  the  method  are  that  it  not  only  provides  an 
overall  assessment  of  the  combination  effect  on  the  entire  two-dimensional  dose  space  spanned  by  the  experimental 
doses,  but  also  enables  one  to  identify  complex  patterns  of  drug  interaction  in  combination  studies.  In  addition,  this 
approach  is  robust  to  outliers.  To  illustrate  this  procedure,  two  case  studies  provided  by  Dr.  William  R.  Greco  at 
Roswell  Park  Cancer  Institute  were  analyzed. 

2.  INTRODUCTION 


Studies  of  interactions  among  biologically  active  agents,  such  as  drugs,  carcinogens,  or  environmental  pollutants 
have  become  increasingly  important  in  many  branches  of  biomedical  research.  For  example,  in  cancer  chemotherapy, 
the  therapeutic  effect  of  many  anticancer  drugs  is  limited  when  they  are  used  as  single  drugs.  Finding  combination 
therapies  with  increased  treatment  effect  and  decreased  toxicity  effect  is  an  active  and  promising  research  area  (1).  An 
effective  and  accurate  evaluation  of  drug  interaction  for  in  vitro  and/or  in  vivo  studies  can  help  to  determine  whether  a 
combination  therapy  should  be  further  investigated. 

The  literature  supports  the  view  that  the  Loewe  additivity  model  should  be  considered  as  the  gold  standard  to 
define  drug  interaction  (2-5).  The  Loewe  additivity  model  defines  an  additive  effect  based  on  the  following  equation 
d  |  d  2  (El) 


Here  y  is  the  predicted  additive  effect,  which  is  produced  by  the  combination  dose  (dh  d2)  when  the  two  drugs  do  not 
interact,  Dy  l  and  Dy2  are  the  respective  doses  of  drug  1  and  drug  2  required  to  produce  the  same  effect  y  when  applied 
alone.  If  we  know  the  dose-effect  relationship  for  each  single  agent,  say  E(d)=f(d)  for  agent  i  (i=  1,2),  we  are  able  to 
obtain  the  dose  Dyi  by  using  the  inverse  function  of f,  denoted  as  frl(y).  By  replacing  Dyl  and  Dy2  in  equation  (E  1 ) 
with/fVj)  and  f2](y),  respectively,  we  can  obtain  an  equation  including  the  single  variable  y,  i.e., 

d{  d  2  j  (E2) 

frl(  y  )  ffx(  y  ) 

By  solving  equation  (E  2),  we  can  obtain  the  predicted  additive  effect  y.  If  the  observed  effect  at  (db  d2)  is  more  than 
(equal  to,  or  less  than)  the  predicted  effect,  we  say  that  the  combination  dose  (dlf  d2)  is  synergistic  (additive,  or 


antagonistic).  When  the  dose-effect  curve  is  decreasing,  for  example,  plotting  percent  cell  survival  versus  dose,  the 
effect  is  more  than  the  predicted  effect  means  that  the  measurement  for  the  observed  effect  is  smaller  than  that  of  the 
predicted  effect. 


In  our  previous  studies  (6-8),  we  found  that  Chou  and  Talalay’s  (9)  median  effect  equation  was  appropriate  to 
describe  the  dose-effect  relationships.  Chou  and  Talalay’s  median  effect  equation,  in  its  nonlinear  form,  can  be  written 
as  follows: 

E  =  jd  /  O,,)' 

!  +  (<*/  ED50)m 

where  ED50  is  the  dose  required  to  produce  50%  of  the  maximum  effect,  and  m  is  the  slope  factor  (Hill  coefficient), 
measuring  the  sensitivity  of  the  effect  to  the  dose  range  of  the  drug.  For  data  in  the  case  studies  provided  by  Dr.  Greco 
(see  Section  4  for  details),  we  found  that  the  median  effect  equation  ( E  3)  can  not  describe  the  marginal  dose  effect 
relationship  adequately,  since  the  plateau  of  the  effect  does  not  go  to  zero  when  a  large  dose  level  of  a  drug  is  applied. 
Instead,  the  following  Emax  model  (E  4)  presented  by  Ting  (10)  describes  the  dose-effect  relationship  very  well: 


( d  /  E  D  soy  E  ma 

i  +  o  /  e  d  50y 


(E4) 


In  the  Emax  model  (E  4),  E0  is  the  base  effect,  corresponding  to  the  measurement  of  response  when  no  drug  is  applied; 
Emax  is  the  maximum  effect  attributable  to  the  drug;  ED50  is  the  dose  level  producing  half  of  Emax,  i.e.,  ED50  is  the  dose 
level  required  to  produce  the  effect  at  a  value  of  E0-0.5Emax  (Figure  1,  Panel  A);  d  is  the  dose  level,  which  produces  the 
effect  E.  Thus,  E0-Emax  will  be  the  asymptotic  net  effect  when  a  large  dose  of  the  drug  is  applied.  Different  maximum 
effects  for  agents  may  reflect  different  mechanism  of  action  for  these  drugs  (11).  In  in-vitro  studies,  one  of  the 
commonly  used  endpoints  is  cell  growth,  which  is  used  to  measure  the  effects  of  inhibitors.  When  no  drugs  (or,  no 
inhibitors)  are  applied,  the  cell  proliferation  obtains  its  largest  value.  In  this  case,  the  dose  effect  curve  is  similar  to  the 
one  shown  in  Figure  1  Panel  A,  where  Emax> 0.  The  effect  range  determined  by  the  dose  effect  curve  lies  between  (E0- 
Emax ,  E0),  and  the  asymptotic  measurement  for  the  maximum  drug  effect  is  E0-Emax  . 

To  investigate  drug  interaction,  theoretically,  we  expect  the  measurements  for  the  endpoints  to  be  similar 
when  no  drug  is  applied.  The  measurements  without  any  drug  applied  are  used  as  controls.  Due  to  certain 
environmental  factors  other  than  experimental  conditions,  the  measurements  for  the  controls  under  different 
environmental  conditions  may  be  different.  Thus,  one  may  need  to  standardize  the  observed  effects  by  the  mean  of  the 
control  for  each  environmental  condition  (2,  3),  and  then  take  E0=  1.  In  this  paper,  we  will  consider  the  following  dose 
effect  curve  for  each  drug: 


( d  /  E  D  50  yn  E  max  (E  5) 

1  +  (d  /  ED  50  )m 


which  assumes  an  effect  at  value  1  when  no  drug  is  applied.  Once  we  obtain  the  dose-effect  curve  for  each  single  drug, 
we  can  use  the  Loewe  additivity  model  (E  1)  to  obtain  the  additive  effect  for  any  combination  dose,  particularly,  for  the 
combination  dose  with  observed  effects.  Thus,  we  may  obtain  the  differences  of  observed  effects  and  the  predicted 
additive  effect  at  each  observed  combination  dose.  We  use  the  bivariate  thin  plate  splines  approach  (12)  to  estimate  the 
relationship  between  these  differences  and  the  combination  doses.  Consequently,  a  response  surface  of  the  differences 
over  the  combination  doses  is  obtained,  and  95%  confidence  surfaces  of  the  response  surface  can  be  constructed.  When 
the  dose  response  curves  decrease  with  increasing  dose,  the  observed  effect  is  less  than  the  predicted  additive  effect 
implies  that  the  observed  effect  is  stronger  than  the  predicted  effect,  thus  indicating  that  the  combination  dose  is 
synergistic.  Conversely,  when  the  observed  effect  is  larger  than  the  predicted  additive  effect,  it  implies  that  the 
observed  effect  is  weaker  than  the  predicted  effect,  thus  indicating  that  the  combination  dose  is  antagonistic.  However, 
these  inferences  should  be  made  based  on  sound  statistical  considerations.  Based  on  the  fitted  response  surface  and  its 
upper  and  lower  confidence  surfaces,  whether  the  difference  is  significantly  less  than  zero,  not  different  from  zero,  or 
greater  than  zero  can  be  judged.  Thus,  the  patterns  of  drug  interaction  in  terms  of  synergy,  additivity,  and  antagonism 
can  be  obtained.  We  organize  our  presentation  as  follows.  In  Section  3.1,  we  describe  the  underlying  stochastic 
assumption  for  the  dose  effect  curve  and  the  procedure  to  estimate  the  parameters  in  each  marginal  dose  effect  curve.  In 
Section  3.2,  we  present  how  to  obtain  the  additive  response  surface  based  on  the  Loewe  additivity  model,  especially  in 
the  case  when  the  maximum  effects  of  the  drugs  are  different.  In  Section  3.3,  we  present  how  to  assess  the  effect 
surface  beyond  the  additivity  surface  and  how  to  construct  its  95%  confidence  surfaces.  Thus,  drug  interactions  in 
terms  of  synergy,  additivity,  or  antagonism  can  be  identified  for  all  combination  doses  in  the  region  containing  the 
combination  design  points.  In  section  4,  we  illustrate  how  to  use  the  procedure  in  Section  3  by  analyzing  the  two  case 
studies  provided  by  Dr.  Greco.  The  last  section  is  devoted  to  a  short  discussion. 


3.  STATISTICAL  METHOD 


Assume  that  the  observed  data  are  (dn,  d2i,  Et)  for  i=l,  ... ,  n.  For  each  i,  (du,  d2i)  is  the  observed  combination 
dose  and  Et  is  the  corresponding  observed  effect.  We  call  the  observations  with  only  drug  1  or  drug  2  applied  alone  as 
marginal  observations.  That  is,  the  marginal  observations  for  drug  1  are  the  observations  (dliy  d2i,  Et)  with  d2i=0  ( i-1 , 
... ,  n ),  and  the  marginal  observations  for  drug  2  are  the  observations  (dn,  d2i,  Et)  with  dn= 0  (i=l,  ...  ,  n).  The  marginal 
dose-effect  curves  will  be  estimated  based  on  the  marginal  observations,  which  are  presented  in  Section  3.1.  It  is 
commonly  accepted  that  the  additive  effect  should  be  obtained  based  on  the  dose-effect  relationships  for  each 
individual  drug.  In  Section  3.2,  we  present  how  to  obtain  the  predicted  effect  at  combination  dose  ( dly  d2)  based  on  the 
Loewe  additivity  model  ( E 1 )  and  the  marginal  dose-effect  curves  (E  5).  We  denote  the  predicted  effect  as  fi  (dl,d2)- 

By  definition,  there  is  no  drug  interaction  when  only  single  drug  is  applied.  Therefore,  the  term  for  drug  interaction  is 
meaningful  only  for  the  combination  dose  (d;  ,d2)  with  nonzero  dj  and  d2.  In  Section  3.3,  we  develop  a  procedure  to 
estimate  the  effect  beyond  additivity  for  any  combination  dose  (dj  yd2)  with  nonzero  and  d2 ,  denoted  by  f^,  d^. 


3.1.  Estimating  dose  effect  curves 

Chou  and  Talalay  (9),  Chou  (4),  and  Kong  and  Lee  (6)  estimate  the  parameters  in  the  median  effect  equation 
(E 3)  by  using  the  transformation  iog E/(l-E )  =  mlog(d/ ED5J  =  ct+mlog(d )  applying  the  least  squares  method  in 

the  linear  regression  setting,  where  a=-m  log(  ED50).  However,  in  our  case  studies  (see  Section  4),  the  experiments 
include  many  low  doses,  whose  measurements  for  effects  are  larger  than  1  after  adjusting  the  effect  at  control  to  be  1 . 
Thus,  a  similar  transformation  for  models  (E  3)  and  (E  5)  can  not  be  carried  out.  Since  the  measurements  are 
continuous,  we  propose  to  apply  nonlinear  least  squares  regression  to  estimate  the  parameters  in  models  (E  3)  and  (E  5) 
with  the  assumption  that  a  stochastic  error  with  N(0,  a2)  exists  on  the  right  hand  side  of  the  two  models.  One  should  be 
clear  that  estimating  the  dose  effect  curve  for  drug  i  only  requires  the  marginal  observations  for  drug  i  with  i-1,  2.  The 
least  squares  method  to  nonlinear  regression  was  applied  to  estimate  the  parameters  in  the  marginal  dose  effect  curves 
in  the  two  case  studies  in  Section  4. 


3.2.  Predicting  additive  effects 

In  this  subsection,  we  present  how  to  obtain  the  predicted  effect  based  on  the  Loewe  additivity  model  (E  1) 
when  model  (E  5)  is  applied  as  the  marginal  dose-effect  curve  for  each  drug.  When  model  (E  5)  is  applied,  the  dose 
required  to  produce  effect  E  is  given  by 


1  -  E 


E  -  (l -Emax) 


However,  the  maximum  effects  for  the  two  drugs  may  be  different.  Without  loss  of  generality,  we  assume  that  the 
maximum  effect  of  drug  1  is  larger  than  the  maximum  effect  of  drug  2,  i.e.,  Emax  l>Emax>2.  For  this  case,  when  the  dose 
effect  curves  are  decreasing,  neither  drug  when  applied  alone  can  produce  an  effect  in  (0, 1-  Emax  l )  (Figure  1,  Panel  B). 
In  the  following,  we  develop  the  procedure  to  obtain  the  predicted  additive  effect  based  on  the  Loewe  additivity  model 
( E  7),  where  we  can  see  that  the  predicted  effect  will  be  in  the  interval  of  (7-  Emaxl ,  1)  for  any  combination  dose  (< dly 

d2\ 


Recall  that  the  Loewe  additivity  model  ( E  1 )  can  be  rewritten  as  ^  +  /  D  2^2  =  d  ,  >  and  rati° 

D  i/d  2  (denoted  as  p(y)),is  often  called  the  relative  potency  of  drug  2  versus  drug  1  at  effect  level  y,  which  means  that 

the  effect  of  1  unit  of  drug  2  will  produce  the  same  effect  as  p(y)  units  of  drug  1.  Generally  speaking,  the  relative 
potency  p(y)  is  dose-dependent  (7).  When  there  is  no  drug  interaction,  the  effect  of  the  combination  dose  ( dly  d2)  will 
produce  the  same  effect  as  drug  1  alone  at  dose  level  Dyl,  which  equals  to  dj+p(y)d2,  or  drug  2  alone  at  dose  Dy  2  , 
which  equals  to  p(y)'1d1+d2  (Figure  2,  Panel  A).  All  these  combination  doses  ( dly  d2)  on  the  line  p  q  will  have  the 
predicted  effect  y,  where  p  q  is  the  line  connecting  the  points  P=(  Dy  h  0)  and  Q=(0,  Dy  2  )  (Figure  2,  Panel  A).  This 
line  p  q  is  often  called  an  additive  isobole  (2,  3). 

When  Emaxj>Emax>2,  as  illustrated  in  Figure  1  Panel  B,  we  can  calculate  the  dose  of  drug  1  required  to 
produce  the  maximum  effect  of  drug  2,  i.e.,  /  E  y/«.  Note  that  the  range  of  the  effect  for 


drug  2  is  (7-  Emax2 ,7),  which  could  be  produced  by  drug  1  alone  at  a  dose  level  between  0  and  Dx  E  ,  •  Based  on 

the  Loewe  additivity  model,  for  any  level  of  effect  y  in  ( 1-Emax  2 ,7),  the  associated  additive  isobole  is  the  line 
connecting  ( Dy  ly  0 )  and  (0,  Dy  2).  Since  when  y  varies  from  1-Emax>2  to  7,  the  dose  of  drug  1  required  to  produce  effect  y 


varies  from  n  to  0,  while  the  dose  of  drug  2  required  to  produce  effect  y  varies  from  infinitely  large  to  0. 

U  1  -  E  m  ax  l  ,1 

Particularly,  when  y  is  close  to  1-  Emax>2 ,  the  dose  of  drug  1  required  to  produce  such  an  effect  y  will  be  close 

to  d  i  E  i  ’  an^  d°se  ^nig  ^  required  to  produce  such  an  effect  y  will  go  to  infinity.  Figure  2  Panel  B  shows 

four  typical  additive  isoboles  (dashed  lines),  which  connect  equally  effective  doses  of  drug  1  and  drug  2  at  different 
effect  levels.  From  left  to  right,  the  effect  level  decreases  in  magnitude.  The  additive  isoboles  may  not  be  parallel  since 
the  relative  potency  may  not  be  constant.  Wheny  varies  in  ( 1-Emax2 , 1 ),  all  these  additive  isoboles  will  cover  the  region 
between  the  two  solid  vertical  lines  (Figure  2  Panel  B).  Meanwhile,  any  combination  dose  (< db  d2 )  with 
dj<  D  ^  |  must  he  on  one  of  these  isoboles.  Therefore,  for  any  combination  dose  (< dh  d2)  with  dj<  Dx _E  ]  ,  the 

predicted  additive  effect,  say  y,  can  be  obtained  by  solving  the  following  nonlinear  equation  for  E: 


ED, 


l  - 


E 

Emax,) 


ED, 


1  -  E 

E  ~  0-  EmttX'2) 


-  =  1  . 


Now  we  examine  the  predicted  effect  for  the  combination  dose  ( dh  d2)  with  j  >  /) 


When 


d,  >  D , 


drug  1  alone  at  dose  d1  produces  an  effect 


(d  x  /  E  D  50>1  )""  E  , 


,  an  effect  beyond  1  -  Em 


,  which  can 


1  +  (d,  /  ED  50,  )■' 

not  be  produced  by  drug  2  alone  at  any  dose  level.  In  this  case,  if  the  effect  at  the  combination  dose  is  more  than  the 
effect  produced  by  drug  1  alone,  then  drug  2  potentiates  the  effect  of  drug  1 .  In  this  case,  synergy  occurs  because  the 
predicted  additive  effect  will  be  the  effect  produced  by  drug  1  alone  at  dose  level  d 7.  Alternatively,  since  drug  2  alone 
can  not  produce  such  an  effect,  we  could  consider  Dy  2  being  infinitely  large.  Thus,  the  Loewe  additivity  model  is 
reduced  to  d}/Dy  ]=L  No  matter  which  approach  we  take,  the  predicted  effect  y  will  be  the  same,  which  can  be  obtained 
by  the  following  equation: 


fl(  D  y,l  )  =  fx(  dx)=  1  - 


O  I  /  eo  5,.,  y 


!+(</,/  ED  50  ,  )’ 


Thus,  we  can  obtain  the  predicted  effect  for  any  combination  dose  ( d1}  d2).  Similar  to  the  notation  in  Kong  and  Lee  (6), 
we  denote  the  predicted  effect  as  p  (dvd2 )  at  the  combination  dose  ( db  d2).  In  the  following  subsection,  we  develop 

the  procedure  to  estimate  the  effect  beyond  the  additivity,  denoted  by  f  (dx,d2)>  and  to  construct  its  95%  confidence 

interval  and  confidence  band.  We  will  assess  drug  interaction  based  on  the  estimated  f  ^  ?  )  and  its  confidence 

band. 


3.3.  Assessing  drug  interactions  using  bivariate  thin  plate  splines 

In  section  3.2,  we  present  how  to  obtain  the  predicted  additive  effect  for  any  combination  dose  ( db  d2 ), 
particularly,  for  any  combination  dose  ( dh  d2)  with  observed  effect.  Thus,  we  can  calculate  the  differences  of  observed 
effects  and  predicted  effect  for  any  observed  combination  dose  ( db  d2).  By  definition,  there  is  no  drug  interaction  when 
a  single  drug  is  used  alone.  Therefore,  we  set  the  differences  to  zero  for  the  marginal  observations,  that  is,  the 
combination  doses  ( db  d2)  with  only  one  nonzero  component.  A  bivariate  thin  plate  spline  is  applied  to  estimate  the 
differences  as  a  function  of  the  combination  dose,  say ,f(dh  d2).  When  the  dose-effect  curves  are  decreasing,/^, 
d2)<0  indicates  that  the  effect  is  more  than  the  predicted  effect  at  (dj,  d2 ),  thus  the  combination  dose  (< dlf  d2 )  is 
synergistic.  Inversely,/^,  d2)>0  indicates  that  the  combination  dose  (< dlf  d2)  is  antagonistic.  Kong  and  Lee  (6)  used  the 
different  observed  combination  doses  as  the  knots  for  the  bivariate  thin  plate  splines  (12).  The  choice  of  knots  is  easier 
if  the  number  of  combination  doses  is  not  large  and  the  combination  doses  are  not  so  close,  such  as  the  ones  from 
factorial  designs  or  uniform  design  (13).  However,  when  ray  designs  are  applied,  the  combination  doses  at  low  doses 
are  very  close  to  each  other,  and  some  columns  of  the  design  matrix  (i.e.,  Cl  and  in  the  following  notations)  may  be 
highly  correlated,  which  result  in  a  nearly  singular  matrix  for  estimating  the  parameters  in  the  function/.  If  that 
happens,  a  low  rank  smoothing  thin  plate  spline  (14),  such  as  the  knots  formed  by  selecting  the  observed  combination 
doses  with  the  distance  being  larger  than  some  pre-specified  small  number,  should  be  applied  to  avoid  the  singularity  of 
the  involved  matrix  due  to  the  low  rank  of  the  design  matrix.  Alternatively,  one  may  take  an  appropriate  transformation 
to  the  dose,  such  as  the  log-transformation,  to  make  the  experimental  combination  doses  under  the  transformation 
evenly  distributed  in  certain  region,  so  that  the  effect  beyond  additivity  can  be  estimated  by  using  bivariate  thin  plate 
splines  without  such  a  difficulty. 

Suppose  the  selected  knots  are  (jclk,Klk}  (k=l,...,K) ,  then  the  bivariate  thin  plate  spline  can  be 
expressed  by  the  following  form: 


f(  d  I,d  2  )  =  To  +  r,d  1  +  r2<i  2  +  X  1.1V  (M  (d  ,  d  2Y  -  (r!t  ’Kik  )T  11 )- 
where  y  -  (y0,yl,y2  ) '  and  v  =  ( v  2 ,  ,  v  K  ) '  are  the  parameters  in  the  thin  plate  spline  function/,  and 


rj  (r)  = - r2  log  r2  for  r  >  0  and  rj  (r)  =  0  for  r  =  0. 

1  67T 

denote 
Q  = 


The  distance  in  the  expression  is  the  Euclidean  distance.  Let  us 


vi}\{K,k,K2k)T  ~(Kw,K2k,)T 

=  [{E1-Fp(du’d2i)y{dn 


1  <k,k'<K 


n^0  &d2 i^O  } 


x=fr  4, AL,„-  z,  =[7(ii(4,4)t  < 

and  Tr  =  [l,  ati(  ,  Klk \ik^K  ■ 

Following  the  notation  by  Kong  and  Lee  (6)  and  Green  and  Silverman  (12),  consider  a  QR  decomposition  of  TJ ,  say 
Tt=FG ,  where  F  is  a  Kx  K  orthogonal  matrix  and  G  is  a  K  x  3  upper  triangular  matrix.  Let  F2  be  the  last  K-3  columns 

of  F.  Set  u  =  (FTnF2yK  and  Z  =  Z/2(f;qf2)|  where  (f/Q  F2  )  "^  is  the  matrix  square  root  of 
F/  Q  f  2  •  Based  on  the  approach  proposed  by  Ruppert,  Wand,  and  Carroll  (15)  and  Wang  (16),  detailed  by  Kong  and 
Lee  (6)  in  this  setting,  the  parameters  in  terms  of  y  and  u  can  be  obtained  by  solving  the  following  mixed  effect 
model: 


Yr  =  Xy  +  Zu  +  £  with 


l 

_ 

T°1 " 

<4-3 

0 


0 


(E6) 


Thus,  the  parameters  can  be  estimated  by  |^Zj  _  ^  T  q  +  %  CT  Y  wdd  /l  =  <j2£  /  g2u  ,  C  =  [X  Z] »  an<i 

D=diag(  0,  0,  0,  1,  ...,  1),  where  the  number  of  zeros  in  the  matrix  D  corresponds  to  the  number  of  y. ' s  ( i=0,l,2 )  and 

the  number  of  ones  corresponds  to  the  number  of  ui' s  (/  =  1 K  -  3  ) .  Under  these  notations,  for  any 

combination  dose  ( db  d2 ),  f(dh  d2)  can  be  predicted  by  f  =  yQ  +  yldl  +  y2d2  +  Z0u  witd 

xi,  \T  /  \i  <  T  x-i/2  ,  and  an  approximate  100(1 -a )%  point-^ wise  confidence  interval  for  f(dh 

Zo-[/MK^)  —  \K\k>  k  )  I  I)  Jt  ^  ^  G  (U  ^F2  ) 

d2)  can  be  constructed  by 

f  (d,,d2)+  Za/2<J sJFd  (crc  +  ic  )"'  c J  , 


(£7) 


where  Cd=(l,  dlf  d2,  Z0)  and  Z^/9  is  the  upper  a  percentile  of  the  standard  normal  distribution.  Thus,  we 

a/  z  - X  1  0  0  % 

2 

can  construct  95%  point-wise  lower  and  upper  confidence  surfaces  for/=0  by  taking  the  intercept  lines  of  the 
confidence  surfaces  with  the  dose  plane.  The  combination  doses  in  the  area  outside  the  bound  with  f<0  are  claimed  to 
be  synergistic,  the  combination  doses  inside  the  bound  are  claimed  to  be  additive,  and  the  combination  doses  in  the  area 
outside  the  bound  with/>0  are  claimed  to  be  antagonistic. 


Note  that  based  on  the  95%  point-wise  confidence  surface  (E  7),  some  combination  doses  which  are  additive  may 
be  claimed  as  synergistic  or  antagonistic  based  on  a  single  surface.  To  be  conservative  and  to  control  the  family-wise 
error  rate,  we  also  construct  a  simultaneous  confidence  band,  which  shares  a  similar  format  to  equation  (E  7)  except 

that  z  /9  is  replaced  by  IeD  F  x  F  a  (17),  where  EDF  is  the  effective  degrees  of  freedom  from  the 

resulting  bivariate  smoothing  splines  (12)  and  is  defined  as  the  trace  of  the  matrix  c^CrC  +  iF))cr’  and 

Frdf  n_EDF  is  the  upper  1  0  0  x  a  percentile  of  the  F  distribution  with  EDF  and  n-EDF  degrees  of  freedom.  Here  n  is 

the  total  number  of  observations  except  controls.  In  each  of  the  two  case  studies  presented  in  next  section,  we  reported 
the  plots  of  different  patterns  of  drug  interaction  based  on  the  95%  point-wise  confidence  intervals  (Cl)  and  the  95% 
simultaneous  confidence  band  (SCB),  respectively  (see  Figure  4  and  6). 


4.  CASE  STUDIES 


The  following  two  data  sets  were  provided  by  Dr.  Greco.  The  two  data  sets  resulted  from  examining  the  joint 
effect  of  trimetrexate  (TMQ)  and  AG2034  with  cells  grown  in  medium  with  different  level  of  folic  acid:  2.3  pM  in  the 
first  experiment  (called  Low  FA  experiment),  and  78  pM  in  the  second  experiment  (called  High  FA  experiment).  Here 


TMQ  is  a  lipophilic  inhibitor  of  the  enzyme,  dihydrofolate  reductase,  and  AG2034  is  an  inhibitor  of  the  enzyme, 
glycinamide  ribonucleotide  formyltransferase.  Ah  drug  concentrations  are  in  pM.  The  endpoint  was  the  growth  of 
HCT-8  human  ileocecal  adenocarcinoma  cells,  in  96-well  plates,  as  measured  by  the  SRB  protein  stain.  Treatments  of 
cells  in  wells  by  drugs  were  randomized  across  the  plates.  Each  96-well  plate  included  8  wells  as  instrumental  blanks 
(no  cells);  thus  88  wells  were  used  for  drug  treatments.  Five  replicate  plates  were  used  for  each  set  of  88  treated  wells. 
Each  of  these  two  large  data  sets  came  from  two  5-plate  stacks  with  a  maximum  of  880  treated  wells  per  experiment. 
There  were  110  control  wells  per  experiment  with  no  drugs  applied  to  the  cells.  Ray  designs  were  used  for  these  two 
experiments,  and  the  experimental  doses  were  distributed  in  14  rays,  including  two  rays  for  TMQ  and  AG2034  when 
used  alone.  Complete  experimental  details  and  mechanistic  implications  are  included  in  Faessel  et  al  (18).  Assuming 
that  the  first  observation  recorded  in  each  dose  or  combination  dose  from  the  first  5-plate  stack  was  from  the  same  plate, 
say  1st  plate,  the  second  observation  from  the  2nd  plate,  and  so  on,  and  also  assuming  that  the  first  observation  recorded 
in  each  dose  or  combination  dose  from  the  second  5-plate  stack  was  from  the  same  plate,  say  6th  plate,  the  second 
observation  from  the  7th  plate,  and  so  on,  we  have  total  10  plates  for  each  of  the  two  data  sets. 

To  examine  whether  there  is  a  significant  difference  among  the  plates  for  control  groups.  We  applied  one-way 
analysis  of  variance  (ANOVA)  to  the  controls  in  each  individual  data  set.  The  p-values  were  0.001  for  the  Low  FA 
experimental  data  and  0.005  for  the  High  FA  experimental  data.  The  results  indicate  a  significant  plate  effect  among 
the  10  plates  for  each  experiment,  that  is,  the  inter-plate  variability  is  high.  To  attenuate  the  effect  from  the  inter-plate 
variability,  a  standardization  procedure,  where  the  effect  readings  were  divided  by  the  mean  of  the  controls  in  each 
associated  plate,  was  applied  for  each  data  set.  Thus,  the  mean  for  controls  within  each  plate  is  standardized  to  1 ,  and 
we  will  treat  the  effect  for  controls  as  1.  In  addition  to  1 10  controls  for  each  experiment,  we  have  761  observations  for 
the  Low  FA  experiment  and  769  observations  for  the  High  FA  experiment.  The  statistical  method  described  in  Section 
3  was  applied  to  each  one  of  the  two  standardized  data  sets,  and  the  results  for  each  experiment  are  presented  in  the 
following  two  subsections. 

Lee  et  al.  (19)  performed  extensively  exploratory  data  analyses  and  identified  129  outliers  out  of  871  (14.8%) 
effect  readings  in  the  Low  FA  experiment  and  126  outliers  out  of  879  (14.3%)  effect  readings  in  the  High  FA 
experiment.  To  compare  with  the  results  obtained  by  Lee  et  al.  (19),  we  also  applied  the  statistical  method  described  in 
Section  3  to  the  data  sets  with  outliers  removed.  For  each  experiment,  we  report  the  detailed  analyses  for  the  original 
data  set  and  the  final  result  for  the  data  set  excluding  outliers. 


4.1.  Case  study  1:  ceils  grown  in  the  medium  with  2.3  pM  folic  acid  (Low  FA  experiment) 

In  the  first  experiment,  called  the  Low  FA  experiment,  the  cells  were  grown  in  medium  with  2.3  juM  folic  acid. 
We  fitted  marginal  dose  effect  curves  for  TMQ  and  AG2034  by  using  both  the  median  effect  equation  ( E  3)  and  the 
Emax  model  (E  5).  The  dose  levels  for  TMQ  when  applied  alone  were  5.47  X  10'6,  4.38  X  10'5,  1.38  X  10'4,  4.38  X  10'4, 
8.75  X  10'4,  1.75  X  10'3,  3.5  X  10'3,  7  X  10 3,  2.21  X  10'2,  7  X  10"2,  and  0.56  |aM,  and  the  dose  levels  for  AG2034  when 
applied  alone  were  2.71  X  10'5,  2.71  X  10'4,  6.87  X  1  O'4,  2.17X  10'3,  4.3  X  10'3,  8.7  X  10'3,  1.74  X  1  O'2,  3.48  X  1  O'2,  0.11, 
0.3475,  and  2.78  pM.  Note  that  some  effect  readings  at  low  doses  or  combination  doses  are  greater  than  1,  thus,  the 
logit  transformation  can  not  be  carried  out.  Nonlinear  least  squares  regression  was  applied  to  estimate  the  parameters  in 
models  (E  3 )  and  (E  5).  Figure  3  Panel  A  and  Panel  B  show  the  fitted  respective  marginal  dose  effect  curves  for  TMQ 
and  AG2034  with  the  dose  levels  shown  on  a  log  scale,  where  the  dotted-dashed  lines  are  the  curves  based  on  the 
median  effect  model  (E  3),  and  the  solid  lines  are  the  dose-effect  curves  based  on  the  Emax  model  (E  5).  From  the  fitted 
dose  effect  curves,  we  found  that  the  Emax  model  provided  a  much  better  fit  than  the  median  effect  equation  for  the 
marginal  data.  Therefore,  we  chose  the  Emax  model  to  describe  the  dose  effect  relationship  in  this  case  study.  The 
parameters  estimated  for  TMQ  and  AG2034  are  shown  in  the  three  columns  under  the  title  “Low  FA”  in  Table  1 .  Here 
the  estimate  of  EmaxTMQ  is  slightly  larger  than  the  estimate  of  Emax>AG2o34 .  We  plotted  the  distribution  of  the  combination 
doses  using  the  original  scale  (not  shown)  and  found  that  most  of  the  combination  doses  were  crowded  in  the  low  dose 
level  region,  which  could  cause  a  singularity  of  the  involved  matrices  due  to  the  low  rank  of  £2  and  Z\  used  for 
estimating  the  effect  beyond  additivity  when  using  bivariate  thin  plate  splines  in  Section  3.3.  Hence,  we  applied  a  log 
transformation  of  the  form  log(dose+<5)  for  each  dose  level,  where  6  is  a  small  number,  say  2.74  X  10'6,  half  of  the 
smallest  dose  level  for  the  two  drugs  when  applied  alone.  We  plotted  the  distribution  of  the  combination  doses  on  the 
log(dose+<5)  scale,  which  is  shown  in  Figure  3,  Panel  C.  In  Panel  C,  the  points  on  the  horizontal  line  are  the  doses  of 
TMQ  on  the  log(dose+<5)  scale,  the  points  on  the  vertical  line  are  the  doses  of  AG2034  on  the  log(dose+<5)  scale,  and 
the  points  on  each  of  the  remaining  12  design  rays  are  the  combination  doses  at  each  ray  with  each  dose  component  on 
the  log(dose-H^)  scale.  The  12  design  rays  for  combination  doses  from  left  to  right  in  Panel  C  correspond  to  the 
combination  doses  at  12  ratios  of  TMQ  to  AG2034,  i.e.,  1:250,  1:125,  1:50,  1:20,  1:10,  1:5,  1:5,  2:5,  4:5,  2:1,  5:1,  10:1. 
The  12  rays  from  left  to  right  in  Panel  C  are  denoted  by  the  letters  E,  F,  G,  H,  I,  J,  K,  L,  M  ,  N,  O,  P,  representing  the 
curves  15,  13,  1 1,  7,  5,  3,  9,  4,  6,  10,  12,  14  in  the  original  data  set  for  the  Low  FA  experiment.  Note  that  the  rays  3  and 
9,  denoted  by  J  and  K,  are  indeed  the  same  fixed  dose  ratio.  To  obtain  the  predicted  additive  effects,  the  procedure 
described  in  Section  3.2  was  applied,  where  the  dose  levels  were  kept  on  the  original  scale.  The  contour  plot  of  the 


predicted  additive  effect  is  shown  in  Figure  3  Panel  D.  Note  that  the  effect  levels  for  TMQ  applied  alone  are  in  (1- 
Emax,TMQ>  1),  which  is  (0.1 190,  1),  and  the  effect  levels  for  AG2034  applied  alone  are  in  (1  -Emax>AG2o34,  1),  which  is 
(0.1312,  1).  The  vertical  line  with  contour  level  0.13  is  the  predicted  effect  produced  by  TMQ  alone  at  such  a  dose 
level.  The  plot  of  the  differences  of  observed  effects  and  predicted  effects  versus  the  dose  levels  of  AG2034  on 
log(dose+<5)  scale  is  shown  in  Figure  3  Panel  E.  From  Panel  E,  the  differences  are  not  distributed  around  zero, 
specifically,  the  differences  are  significantly  less  than  zero  for  some  observations  with  AG2034  in  the  range  of  (-7,  -4) 
on  the  log  (dose+  S)  scale  with  <5=2.74  X  10'6,  i.e.,  in  the  range  of  0.001  pM  to  0.018  pM  on  the  original  dose  scale. 
Therefore,  the  pure  additive  effect  model  could  not  describe  the  data  well.  At  all  single  or  combination  doses,  we  used 
bivariate  thin  plate  splines  to  fit  the  differences  versus  the  transformed  doses  with  the  knots  at  all  the  distinct  dose 
levels.  The  transformation  is  taken  as  log  (dose+  S)  where  dose  is  a  single  or  a  combination  dose.  By  convention,  there 
is  no  drug  interaction  when  a  single  drug  is  applied.  Therefore  the  differences  were  set  as  zeroes  for  the  marginal  doses. 
By  applying  the  bivariate  thin  plate  splines  in  Section  3.3,  we  obtain  ^  _  0.0041,  <j2  =  0.2318,  andi=<j2 /a2  =0.0178. 

Next,  95%  point- wise  upper  and  lower  confidence  surfaces  were  constructed  based  on  equation  ( E  7).  Figure  3  Panel  F 
shows  the  contour  plot  of  the  fitted  spline  function  f(dly  d2)  at  the  levels  of  -0.1,  0,  and  0.1  as  thin  solid  lines,  the 
intercept  lines  of  its  corresponding  95%  point-wise  upper  confidence  surface  with  the  dose  plane  as  thick  dashed  lines, 
and  the  intercept  lines  of  its  corresponding  95%  point-wise  lower  confidence  surface  with  the  dose  plane  as  thick  solid 
lines.  The  combination  doses  inside  the  thick  dashed  curves,  painted  as  light  blue,  are  synergistic  since  the  effects 
beyond  additivity  at  these  combination  doses  are  significantly  smaller  than  zero;  the  combination  doses  inside  the  thick 
solid  curves,  painted  as  light  pink,  are  antagonistic  since  the  effects  beyond  additivity  at  these  combinations  were 
significantly  larger  than  zero.  The  combination  doses  in  the  uncolored  region,  which  lie  between  the  thick  solid  curves 
and  the  thick  dashed  curves,  are  additive  since  the  effects  beyond  additivity  are  not  significantly  different  from  zero. 
Specifically,  the  combination  doses  with  AG2034  in  the  transformed  scale  in  the  range  of  (-7,  -4)  inside  the  thick 
dashed  line  are  synergistic,  which  is  consistent  with  the  residual  plot  in  Panel  E.  The  fitted  response  surface  was 
obtained  by  adding  the  fitted  spline  function/  (i.e.,  the  effect  beyond  additivity)  to  the  predicted  additive  surface,  and 
the  contour  plot  of  the  fitted  response  surface  at  the  contour  levels  of  0.2,  0.5,  and  0.9  is  shown  in  Figure  3,  Panel  I. 

The  final  residuals  were  obtained  by  subtracting  the  fitted  effects  from  the  observed  effects.  The  plots  of  final  residuals 
versus  the  dose  levels  of  TMQ  and  AG2034  on  the  log  (dose+  S)  scale  are  shown  in  Figure  3  Panel  G  and  H, 
respectively.  From  these  two  panels,  we  see  that  the  residuals  are  centered  around  zero  along  the  experimental  dose 
range.  We  conclude  that  the  model  fits  the  data  reasonably  well. 

To  examine  the  patterns  of  drug  interactions  in  different  rays  and  different  experimental  combination  doses, 
we  combined  Panels  F  and  I  in  Figure  3,  that  is,  we  plotted  the  contour  curves  of  the  fitted  response  surface  at  the 
levels  of  0.2,  0.5,  and  0.9  in  Panel  F,  along  with  the  representative  design  rays  and  experimental  combination  doses  as 
dots  on  these  rays,  shown  in  Figure  4,  Panel  A.  From  Figure  4,  Panel  A,  the  combination  doses  on  the  rays  E  through  K 
(Curve  15,  13,  1 1,  7,  5,  3,  9  in  the  original  data  set)  are  synergistic  when  the  effect  levels  are  between  0.9  to  a  number 
smaller  than  0.2.  The  combination  doses  on  these  rays  are  additive  when  the  effect  level  is  less  than  this  small  number, 
and  the  combination  doses  at  low  level  on  these  lines  are  either  additive  or  antagonistic.  The  combination  doses  on  the 
rays  N,  O,  and  P  (Curves  10,  12,  and  14  in  the  original  data  set)  are  additive  when  the  effects  are  less  than  0.9,  and  the 
combination  doses  at  low  dose  levels  are  antagonistic. 

In  addition  to  the  95%  point-wise  confidence  surface,  ,we  also  constructed  the  95%  simultaneous  confidence 
band  with  X  =  0.0178  ,  and  /„  „ — —a - =  12.20,  where  n-761 ,  EDF=  119,  and  a  =  0.05-  The  resulting 
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patterns  of  drug  interactions  are  shown  in  Figure  4,  Panel  B,  where  the  thick  dashed  line  is  the  intercept  line  of  the  95% 
upper  simultaneous  confidence  surface  with  the  dose  plane.  Based  on  Figure  4,  Panel  B,  we  conclude  that  the 
combination  doses  inside  the  thick  dashed  curves,  painted  as  light  blue,  are  synergistic.  The  combination  doses  outside 
the  thick  dashed  curves  are  additive.  As  it  can  be  seen,  the  synergistic  area  shrinks  using  the  simultaneous  confidence 
band  method  compared  to  the  point- wise  confidence  interval  approach  and  the  antagonistic  area  disappears.  A  point- 
wise  confidence  interval  is  appropriate  for  making  inferences  for  each  observed  design  ray.  The  simultaneous 
confidence  band  is  suitable  for  making  a  global  assessment.  However,  it  can  be  overly  conservative. 

In  addition,  we  fitted  the  data  set  with  outliers  removed  (19)  for  the  Low  FA  experiment,  the  results  for 
assessing  drug  interactions  are  presented  in  Figure  4,  Panel  C  and  Panel  D.  The  information  in  Panel  C  is  parallel  to 
that  in  Panel  A,  and  the  information  in  Panel  D  is  parallel  to  that  in  Panel  B.  By  comparing  the  plots  across  panels,  we 
conclude  that  the  results  from  fitting  the  original  data  set  and  those  from  fitting  the  data  set  excluding  outliers  are  very 
similar.  Therefore,  the  semiparametric  method  presented  in  Section  3  is  robust  to  outliers  in  this  example. 

It  should  be  noticed  that  extrapolations  based  on  spline  estimations  have  to  be  considered  with  caution.  The 
fitted  response  surface  for  the  differences  between  the  observed  effects  and  predicted  effects  gives  an  overall  picture  of 
drug  interaction  (see  Figure  4,  Panel  A  or  Panel  B).  However,  the  fitted  results  on  the  two  larger  areas  outside  the 
experiment  rays  E  and  P  should  not  be  over-interpreted  since  there  is  no  experimental  data  in  such  areas  and  we  forced 
the  differences  of  the  observed  effects  and  predicted  additive  effects  to  be  zero  at  the  marginal  observed  dose  levels. 


4.2.  Case  study  2:  cells  grown  in  the  medium  with  78  pM  folic  acid  (High  FA  experiment) 

In  the  High  FA  experiment,  the  dose  levels  for  TMQ  when  applied  alone  were  5.47  X  10'6,  4.38  X  10'5, 

1.38  X  10'4,  4.38  X  10'4,  8.75  X  104,  1.75  X  10'3,  3.5  X  10'3,  7X  10'3,  2.21  X  10'2,  7  X  10'2,  and  0.56  nM,  and  the  dose 
levels  for  AG2034  when  applied  alone  were  2.71  X  KT4,  2.17X  I0‘\  6.87  X  1  O'3,  2.17  X10'2,  4.34  X  I O'2,  8.68  X  10'2, 
1.74  X  10'1,  3.47  X  10'1,  1.1,  3.47,  and  27.8  pM.  The  procedure  to  analyze  this  data  set  was  the  same  as  in  case  study  1. 
By  applying  nonlinear  least  squares  regression,  we  estimated  the  marginal  dose  effect  curves  using  the  median  effect 
equation  (E  3)  (dotted-dashed  lines)  and  the  Emax  model  (E  5)  (solid  lines),  shown  in  Figure  5,  Panel  A  and  B.  It  is 
clear  that  the  Emax  model  fitted  the  data  better  than  the  median  effect  equation,  thus,  we  chose  the  Emax  model  as  the 
dose  effect  curve  for  this  data  set.  The  estimated  parameters  for  the  marginal  dose  effect  curves  for  the  Emax  model  are 
shown  in  the  three  columns  under  the  title  “High  FA”  in  Table  1.  The  combination  doses  on  the  original  scale  (not 
shown)  are  crowded  in  low  dose  level  region,  thus  we  applied  the  transformation  in  the  form  of  log(dose+<5)  to  each 
dose  level,  where  S  is  a  small  number,  say  2.74  X  10'6 ,  one  half  of  the  lowest  dose  level  for  TMQ  and  AG2034  when 
applied  alone.  The  distribution  of  the  experimental  dose  levels  on  the  log(dose+<5)  scale  is  shown  in  Figure  5  Panel  C. 
The  12  design  rays  for  the  combination  doses  correspond  to  the  12  dose  ratios  of  TMQ  versus  AG2034  at  1 :2500, 
1:1250,  1:500,  1:200,  1:100,  1:50,  1:50,  1:25,  1:12.5,  1:5,  1:2,  1:1,  which  are  denoted  by  the  letters  E,  F,  G,  H,  I,  J,  K, 
L,  M  ,  N,  O,  and  P,  representing  the  curves  15,  13,  1 1,  7,  5,  3,  9,  4,  6,  10,  12,  14  in  the  original  data  set  for  the  High  FA 
experiment.  By  applying  the  procedure  described  in  Section  3.2,  we  obtained  the  contour  plot  of  the  predicted  additive 
effect  shown  in  Figure  5,  Panel  D.  Particularly,  the  contour  line  at  level  0.15  is  the  predicted  effect  produced  by  TMQ 
alone  since  AG2034  could  not  produce  such  an  effect  when  applied  alone,  and  the  effect  levels  for  AG2034  applied 
alone  ranged  from  0.1816  to  1.  Figure  5,  Panel  E  shows  the  differences  of  observed  effects  and  predicted  effects  versus 
the  dose  levels  of  AG2034  on  the  log(dose-H^)  scale.  From  Panel  E,  the  differences  are  not  centered  around  zero, 
specifically,  they  are  significantly  less  than  zero  for  some  observations  with  AG2034  in  the  range  of  (-5,  0)  on  the 
log(dose+<5)  scale,  i.e.,  in  the  range  of  6.7  X  10'3  pM  to  1.0  pM  on  the  original  dose  scale,  indicating  that  some 
combination  doses  were  synergistic  and  the  pure  additive  effect  model  could  not  describe  the  data  well.  We  used 
bivariate  thin  plate  splines  to  fit  these  differences  versus  the  transformed  doses  or  combination  doses  with  the  knots  at 
all  distinct  dose  levels.  The  transformation  is  taken  as  log  (dose+  S)  where  dose  is  a  single  dose  or  a  combination  dose. 
We  constructed  its  95%  point- wise  confidence  surfaces  based  on  equation  (E  7).  The  estimated 
a]  =  0.0066,  =  0.0779 ,  and  X=g2£  /of  =  0.0842.  Figure  5,  Panel  F  shows  the  contour  plot  of  the  fitted  spline 

function/  at  levels  of  -0.1,  0,  and  0.1  as  thin  solid  lines,  the  intercept  lines  of  its  corresponding  95%  point-wise  upper 
confidence  surface  with  the  dose  plane  as  thick  dashed  lines,  and  the  intercept  lines  of  its  corresponding  95%  point- 
wise  lower  confidence  surface  with  the  dose  plane  as  thick  solid  lines.  The  combination  doses  inside  the  thick  dashed 
curves,  painted  as  light  blue,  are  synergistic;  the  combination  doses  inside  the  thick  solid  curves,  painted  as  light  pink, 
are  antagonistic,  while  the  combination  doses  in  the  uncolored  area  are  additive.  The  fitted  response  surface  was 
obtained  by  adding  the  fitted  spline  function/to  the  predicted  additive  surface,  which  is  shown  in  Figure  5,  Panel  I. 

The  plots  of  the  final  residuals  versus  the  dose  levels  of  TMQ  and  AG2034  on  the  log(dose+^)  scale  are  shown  in 
Figure  5,  Panels  G  and  H,  respectively.  From  these  two  panels,  the  residuals  are  centered  around  zero  along  the 
experimental  dose  range  indicating  that  the  model  describes  the  data  reasonably  well. 

To  examine  the  patterns  of  drug  interactions  in  different  rays  and  different  experimental  combination  doses, 
we  combined  Panels  F  and  I  in  Figure  5  to  form  Figure  6,  Panel  A,  as  we  did  for  analyzing  the  Low  FA  experiment 
data.  From  Figure  6,  Panel  A,  the  combination  doses  on  all  12  rays  are  synergistic  when  the  effect  levels  are  between 
0.9  and  0.15.  The  combination  doses  at  high  dose  levels  are  additive,  and  most  the  combination  doses  at  low  dose 
levels  are  additive.  In  addition,  we  constructed  a  95%  simultaneous  confidence  band  based  on  equation  (E  7)  with 
7  replaced  by  Ifdf  x  Fa  •  Here  EDF=  91,  n=769,  and  /fdf  x  Fa  =10.77.  The  results  are 

Ca/ 2  r  J  A  r  EDF  ,n-EDF  ^  C,  LJ  T  X  rEDF,n-EDF 

presented  in  Figure  6,  Panel  B,  where  the  thick  dashed  line  is  the  intercept  line  of  the  upper  95%  simultaneous 
confidence  surface  with  the  dose  plane.  Based  on  Figure  6,  Panel  B,  we  conclude  that  the  combination  doses  inside  the 
thick  dashed  curves,  painted  as  light  blue,  are  synergistic.  The  combination  doses  outside  the  thick  dashed  curves  are 
additive.  Again,  the  simultaneous  confidence  band  yields  more  conservative  results  and  is  more  suitable  for  the  global 
assessment.  In  addition,  we  fitted  the  data  set  with  outliers  removed  for  the  High  FA  experiment.  The  results  for 
assessing  drug  interactions  are  presented  in  Figure  6,  Panel  C  and  Panel  D.  By  comparing  Panel  C  to  Panel  A,  and 
Panel  D  to  Panel  B,  we  conclude  that  the  results  from  fitting  the  original  data  set  and  those  from  fitting  the  data  set 
excluding  outliers  are  very  similar.  Thus,  the  results  indicate  that  the  semiparametric  method  is  robust  to  outliers. 


5.  DISCUSSION 

We  extended  the  approach  proposed  by  Kong  and  Lee  (6)  to  the  case  where  the  Emax  model  is  more 
appropriate  to  describe  the  marginal  dose  effect  relationship.  It  may  not  be  unusual  that  some  effect  readings  at  low 
doses  to  be  beyond  the  mean  of  the  controls.  In  this  case,  the  standardized  effect  is  greater  than  1  and  a  logit 


transformation  to  a  linear  model  (4,  8,  9)  cannot  be  carried  out.  Hence,  other  models  such  as  the  Emax  model  are  needed 
and  nonlinear  regression  methods  can  be  applied  for  estimating  parameters  for  the  dose-effect  curves.  In  the  case 
studies  in  Section  4,  nonlinear  least  squares  regression  was  applied  to  estimate  the  parameters  for  the  dose-effect  curves 
specified  by  the  median  effect  equation  and  the  Emax  model. 

Another  extension  of  the  approach  by  Kong  and  Lee  (6)  in  this  paper  is  a  solution  to  the  problem  arising  when 
the  experimental  points  are  very  close,  when  the  low  rank  of  the  design  matrix  may  cause  computational  problems  in 
matrix  inversion.  In  this  case,  one  may  consider  a  low-rank  thin  plate  spline  (14)  to  estimate  the  surface  beyond 
additivity,  or  alternatively,  one  may  apply  an  appropriate  transformation  to  the  doses  so  that  the  combination  doses  on 
the  transformed  scale  are  more  evenly  distributed.  In  our  case  studies,  we  first  applied  the  transformation  log(dose+<5) 
to  each  component  of  the  combination  doses  and  then  applied  bivariate  thin  plate  splines  with  knots  being  all  the 
different  observed  doses  on  the  log(dose+^)  scale.  In  both  case  studies,  we  chose  5  as  half  of  the  smallest  non-zero 
dose  among  TMQ  and  AG2034  when  applied  alone,  that  is,  6=2.74  X  10'6  for  both  experiments.  The  6  should  not  be 
selected  too  small  or  too  large  compared  with  the  magnitude  of  the  dose  levels.  An  extremely  small  6  will  result  in  a 
relatively  large  distance  between  the  marginal  doses  and  combination  doses.  Conversely,  a  large  5  will  dominate  in  the 
transformation  log(dose+<5)  when  the  dose  levels  are  low.  From  the  final  residual  plots,  it  is  evident  that  the  current 
transformation  works  well. 

It  is  well  known  that  the  smoothing  parameter  X  governs  the  trade-off  between  the  goodness-of-fit  and  the 
smoothness  of  the  function/  When  X  becomes  larger,  the  fitted  function /tends  to  be  smoother  and  the  residuals  tend 
to  be  larger.  The  selection  of  the  smoothing  parameter  plays  a  key  role  in  the  fitted  results.  In  our  case  studies,  the 
smoothing  parameter,  X  ,  was  selected  as  ^  ^ ,  which  is  almost  identical  to  the  selected  smoothing  parameter 

based  on  the  generalized  cross  validation  (GCV)  criterion  and  "leave-out-one"  cross  validation  (CV)  criterion.  For 
example,  for  the  Low  FA  experimental  data,  the  selected  parameters  based  on  the  mixed  model  approach,  CV,  and 
GCV  were  0.0178,  0.01 12,  0.0071,  respectively,  while  for  the  High  FA  experimental  data,  the  corresponding  selected 
parameters  were  0.0842,  0.0842,  0.0531,  respectively.  Indeed,  Kohn,  Ansley,  and  Tharm  (20)  showed  that  the 
estimation  of  the  smoothing  parameter  based  on  a  mixed  model  approach  is  comparable  with  the  standard  method  of 
GCV.  By  applying  a  mixed  effects  model,  the  smoothing  parameter  can  be  automatically  determined  by  #  2  /  ^  2  . 

This  method  has  been  implemented  in  S-PLUS  by  Ruppert  et.  al.  (15)  by  using  the  function  lme  (21).  In  our  previous 
study  (6),  based  on  extensive  simulations,  we  showed  that  the  selection  of  the  smoothing  parameter  provides  a  good 
estimate  to  the  underlying  function  in  general. 

In  the  two  case  studies,  we  also  performed  the  same  analyses  for  the  two  reduced  data  sets  analyzed  by  Lee  et 
al.  (19),  and  the  results  were  almost  identical,  which  indicates  that  the  semiparametric  method  developed  here  is  robust 
to  outliers.  The  semiparametric  method  can  also  assess  drug  interactions  for  the  combination  doses  not  on  the  design 
rays,  and  identify  complex  patterns  of  drug  interaction  in  combination  studies.  In  addition,  the  semiparametric  method 
gives  an  overall  assessment  of  the  combination  effect  in  the  entire  two-dimensional  dose  space  spanned  by  the 
experimental  doses  with  a  caveat  that  extrapolation  beyond  data  points  can  be  risky. 

Last  but  not  the  least,  we  would  like  to  point  out  that  the  estimated  function  f(dhd2)  and  its  95%  confidence 
surfaces  can  guide  the  exploration  of  whether  some  parametric  models  are  sufficient  to  describe  the  data.  In  the 
literature,  many  parametric  models  have  been  proposed.  Greco,  Bravo,  and  Parsons  (22)  gave  an  excellent  review  on 
the  response  surface  approach.  However,  when  one  has  no  prior  knowledge  on  the  response  surface  model,  or  when  the 
data  cannot  be  adequately  represented  by  a  parametric  model,  most  parametric  approaches  will  fail.  Blindly  using  any 
parametric  model  can  be  dangerous  and  may  lead  to  the  wrong  conclusions  of  drug  interactions.  In  our  proposed 
approach,  there  is  no  need  to  assume  any  parametric  models  for  f(dlfd2).  We  provide  a  promising  approach  by  modeling 
the  mixture  effect  data  with  spline  techniques  via  a  mixed-effect  model.  We  advocate  the  use  of  the  semiparametric 
method  for  model  building  because  we  typically  do  not  know  the  true  patterns  of  drug  interactions.  The  conclusions  of 
drug  interactions  are  based  on  the  estimated/ and  its  confidence  surfaces,  which  are  determined  by  the  underlying  data. 
The  S-PLUS  code  for  the  current  case  studies  can  be  obtained  from  the  first  author. 
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Table  1:  The  estimated  parameters  for  the  Emax  models  in  the  two  case  studies:  the  first  three  columns  are  the 
estimated  parameters  for  the  marginal  dose  effect  curves  in  the  Low  FA  experiment,  and  the  last  three  columns  are  the 
estimated  parameters  for  the  marginal  dose  effect  curves  in  the  High  FA  experiment 


Low  FA 

High  FA 

Drug  name 

F 

max 

ed50 

Slope  m 

F 

max 

ed50 

Slope  m 

TMQ 

0.8810 

(0.0161) 

0.0013 

(0.0001) 

2.2496 

(0.2330) 

0.8847 

(0.0326) 

0.0134 

(0.0015) 

3.7230 

(0.7323) 

AG2034 

0.8688 

(0.0154) 

0.0060 

(0.0003) 

3.1644 

(0.3703) 

0.8184 

(0.0311) 

0.4700 

(0.0540) 

1.6869 

(0.2400) 

A:  Typical  Emax  model  B:  Dose  effect  curves 

with  different  maximal  effects 


Figure  1:  Panel  A  shows  a  typical  dose  effect  curve  with  the  maximum  effect,  i.e.,  Emax ,  being  less  than  1.  ED50  in 
Panel  A  is  the  dose  required  to  produce  half  of  the  maximum  effect,  i.e.,  E0-0.5E ma;c.  Panel  B  shows  two  dose  effect 
curves  with  different  maximum  effects,  say,  Emaxj>Emax>2 •  In  Panel  B,  drug  1  at  dose  level  produces  the 

maximum  effect  produced  by  drug  2  alone. 


Drug  1 


Figure  2:  Panel  A  shows  an  additive  isobole  under  the  Loewe  additivity  model.  Any  combination  dose  ( db  d2 )  on  the 

line  PQ  produces  the  same  effect  as  drug  1  alone  at  dose  Dy?1  (i.e.,  dj+  p(y)d2 ),  or  drug  2  alone  at  dose  Dy  2  (i.e., 
p(y) 1  di+d2),  y  is  the  predicted  effect  for  any  combination  dose  at  the  line  p  q  ,  and  p(y)  is  the  relative  potency  at  the 

effect  level  y.  Panel  B  shows  that  the  additive  isoboles  associated  with  the  effect  level  in  (J-Emax2,  1)  cover  the  bound 
between  the  two  solid  vertical  lines  under  the  assumption  Emax>1>Emax>2.  Each  dashed  line  corresponds  to  an  isobole. 


B.  Dose-response  curves 


C:  Distribution  of  log(dose)'s 


Dose-effect  curves  (Low  FA) 


G:  Residuals  vs  TMQ  (Final) 

Oi - 


in 

o 


LO 

o 


-12  -10  -8  -6  -4  -2 

log(TMQ) 


E:  Obs.-Pred.  vs  AG2034 


H:  Residuals  vs  AG2034  (Final) 

Oj - 


in 


LO 

o 


p 

-12  To  ^8  ^6  5  -2  0 

log(AG2034) 


F:  Effect  beyond  additivity 


Figure  3:  Results  from  analyzing  the  Low  FA  experimental  data.  Panels  A  and  B  show  the  fitted  marginal  dose  effect 
curves  for  TMQ  and  AG2034  respectively,  where  the  dotted-dashed  line  in  each  panel  is  the  fitted  dose  effect  curve 
based  on  the  median  effect  equation  ( E  3 ),  while  the  solid  line  in  each  panel  is  the  fitted  dose  effect  curve  based  on  the 
Emax  model  (E  5).  Panel  C  shows  the  distribution  of  the  experimental  doses  and  combination  doses  on  the  log  (dose+5) 
scale  with  5=2.74  X  10'6,  along  with  the  12  rays  from  left  to  right  with  dose  ratios  of  TMQ  versus  AG2034  at  1:250, 
1:125,  1:50,  1:20,  1:10,  1:5,  1:5,  2:5,  4:5,  2:1,  5:1,  10:1,  denoted  by  the  letters  E,  F,  G,  H,  I,  J,  K,  L,  M  ,  N,  O,  and  P, 
representing  the  curves  15,  13,  11,  7,  5,  3,  9,  4,  6,  10,  12,  and  14  in  the  original  data  set.  Panel  D  shows  the  contour  plot 
of  the  predicted  additive  effect,  while  Panel  E  shows  the  plot  of  the  differences  between  the  observed  effects  and  the 
predicted  effects  versus  the  dose  level  of  AG2034  on  the  log  (dose+5)  scale.  Panel  F  shows  the  contour  plot  of  the 
fitted  effect  beyond  the  additivity  effect  at  levels  -0.1,  0,  and  0.1  as  thin  solid  lines,  along  with  the  intercept  line  of  the 
upper  95%  point-wise  confidence  surface  with  the  dose  plane  as  thick  dashed  lines  and  the  intercept  line  of  the  lower 
95%  point-wise  confidence  surface  with  the  dose  plane  as  thick  solid  lines.  In  Panel  F,  the  combination  doses  in  the 
light  blue  area  are  synergistic,  the  combination  doses  in  the  light  pink  area  are  antagonistic,  and  the  combination  doses 
in  the  uncolored  area  are  additive.  The  colored  lines  in  Panels  C  and  I  are  the  design  rays.  Panels  G  and  H  are  the  plots 
of  the  final  residuals  versus  TMQ  and  AG2034  on  the  log  (dose+5)  scale,  respectively,  and  Panel  I  is  the  contour  plot 
of  the  fitted  response  surface  at  the  levels  of  0.9,  0.5,  and  0.2,  along  with  some  representative  design  rays. 
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Figure  4:  Different  patterns  of  drug  interactions  for  the  Low  FA  experimental  data  based  on  95%  point-wise 
confidence  intervals  (Panel  A)  and  95%  simultaneous  confidence  bands  (Panel  B).  Panel  A  is  the  combination  of 
Figure  3  Panel  F  and  Panel  I,  along  with  the  design  points  shown  as  dots  on  each  ray.  The  thin  solid  lines  are  the 
contour  lines  of  the  fitted  effect  surface  beyond  the  additivity  surface  at  the  levels  of  -0.1,  0,  and  0.1,  the  thick  dashed 
lines  are  the  intercept  lines  of  the  upper  95%  point-wise  confidence  surface  with  the  dose  plane,  and  the  thick  solid 
lines  are  the  intercept  lines  of  the  lower  95%  point-wise  confidence  surface  with  the  dose  plane.  The  colored  lines  with 
the  letters  “E”,  “G”,  “J,  K”,  “N”,  and  “P”  are  the  representatives  of  the  design  rays.  The  red  dotted-dashed  lines  are  the 
contour  lines  of  the  fitted  response  surface  at  the  levels  of  0.9,  0.5,  and  0.2.  Based  on  Panel  A,  the  combination  doses  in 
the  light  blue  area  are  synergistic,  the  combination  doses  in  the  light  pink  area  are  antagonistic,  and  the  combination 
doses  on  the  uncolored  area  are  additive.  Panel  B  presents  the  same  information  as  Panel  A  except  that  the  thick  dashed 
lines  are  the  intercept  lines  of  the  upper  95%  simultaneous  confidence  surface  with  the  dose  plane  and  there  are  no 
intercept  lines  for  the  lower  95%  simultaneous  confidence  surface  with  the  dose  plane.  Based  on  Panel  B,  the 
combination  doses  inside  the  dashed  lines  are  synergistic,  otherwise  additive.  Panel  B  gives  more  conservative  results 
for  assessing  drug  interactions.  Panel  C  and  Panel  D  are  the  results  from  fitting  the  data  set  excluding  outliers  for  the 
Low  FA  experiment,  where  the  information  in  Panel  C  is  parallel  to  that  in  Panel  A,  and  the  information  in  Panel  D  is 
parallel  to  that  in  Panel  B. 
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Figure  5:  Results  from  analyzing  the  High  FA  experimental  data.  Panels  A  and  B  show  the  fitted  marginal  dose  effect 
curves  for  TMQ  and  AG2034  respectively,  where  the  dotted-dashed  line  in  each  panel  is  the  fitted  dose  effect  curve 
based  on  the  median  effect  equation  ( E  3),  while  the  solid  line  in  each  panel  is  the  fitted  dose  effect  curve  based  on  the 
Emax  model  (E  5).  Panel  C  shows  the  distribution  of  the  experimental  doses  and  combination  doses  on  the  log(dose+5) 
scale  with  5=2.74  X  10'6,  along  with  the  12  rays  from  left  to  right  with  dose  ratios  of  TMQ  versus  AG2034  at  1:2500, 
1:1250,  1:500,  1:200,  1:100,  1:50,  1:50,  1:25,  1:12.5,  1:5,  1:2,  1 : 1 ,  denoted  by  the  letters  E,  F,  G,  H,  I,  J,  K,  L,  M  ,  N, 

O,  P,  representing  the  curves  15,  13,  11,  7,  5,  3,  9,  4,  6,  10,  12,  14  in  the  original  data  set.  Panel  D  shows  the  contour 
plot  of  the  predicted  additive  effect,  while  Panel  E  shows  the  plot  of  the  differences  between  the  observed  effects  and 
the  predicted  effects  versus  the  dose  level  of  AG2034  on  the  log(dose+5)  scale.  Panel  F  shows  the  contour  plot  of  the 
fitted  effect  beyond  the  additivity  effect  at  levels  -0.1,  0,  and  0.1,  along  with  the  intercept  line  of  the  upper  confidence 
surface  with  the  dose  plane  as  thick  dashed  lines.  In  Panel  F,  the  combination  doses  in  the  light  blue  area  are  synergistic, 
the  combination  doses  in  the  light  pink  area  are  antagonistic,  and  the  combination  doses  in  the  uncolored  area  are 
additive.  The  colored  lines  in  Panel  C  and  I  are  the  representatives  of  the  design  rays.  Panel  G  and  H  are  the  plots  of  the 
final  residuals  versus  TMQ  and  AG2034  on  the  log(dose+5)  scale,  respectively,  and  Panel  I  is  the  contour  plot  of  the 
fitted  response  surface  at  the  levels  of  0.9,  0.5,  and  0.15,  along  with  some  representative  design  rays. 
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Figure  6:  Different  patterns  of  drug  interactions  for  the  High  FA  experiment  based  on  95%  point- wise  confidence 
intervals  (Panel  A)  and  95%  simultaneous  confidence  bands  (Panel  B).  Panel  A  is  the  combination  of  Figure  5  Panels  F 
and  I,  along  with  the  design  points  shown  as  dots  on  each  ray.  The  thin  solid  lines  are  the  contour  lines  of  the  fitted 
effect  surface  beyond  the  additivity  surface  at  the  levels  of  -0.1,  0,  and  0.1,  the  thick  dashed  lines  are  the  intercept  lines 
of  the  upper  95%  point-wise  confidence  surface  with  the  dose  plane,  and  the  thick  solid  lines  are  the  intercept  lines  of 
the  lower  95%  point-wise  confidence  surface  with  the  dose  plane.  The  colored  lines  with  the  letter  “E”,  “G”,  “J,  K”, 
“N”,  and  “P”  are  the  representatives  of  the  design  rays.  The  red  dotted-dashed  lines  are  the  contour  lines  of  the  fitted 
response  surface  at  the  levels  of  0.9,  0.5,  and  0.15.  In  Panel  A,  the  combination  doses  in  the  light  blue  area  are 
synergistic,  the  combination  doses  in  the  light  pink  area  are  antagonistic,  and  the  combination  doses  in  the  uncolored 
area  are  additive.  Panel  B  gives  the  same  information  as  Panel  A  except  that  the  thick  dashed  lines  are  the  intercept 
lines  of  the  upper  95%  simultaneous  confidence  surface  with  the  dose  plane.  Based  on  Panel  B,  the  combination  doses 
inside  the  dashed  lines  are  synergistic,  otherwise  additive.  Panel  B  gives  more  conservative  results  for  assessing  drug 
interactions.  Panel  C  and  Panel  D  are  the  results  from  fitting  the  data  set  excluding  outliers  for  the  High  FA  experiment, 
where  the  information  in  Panel  C  is  parallel  to  that  in  Panel  A,  and  the  information  in  Panel  D  is  parallel  to  that  in 
Panel  B. 
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Table  1.  Summary  of  parameter  estimates  (Standard  Error)  for  the  Low  FA  case 


Curve 

Dose  ratio 
(TMQ/AG) 

Emax 

ED50 

m 

Residual  sum 
of  squares 

A  (8) 

B  (16) 

C(l) 

0.877  (0.007) 

0.00133  (0.00006) 

2.345  (0.190) 

0.0779 

D(2) 

0.872  (0.007) 

0.00621  (0.00024) 

3.045  (0.269) 

0.0749 

E  (15) 

0.004 

0.869  (0.008) 

0.00359(0.00017) 

3.250  (0.437) 

0.0969 

F  (13) 

0.008 

0.863  (0.008) 

0.00294  (0.00014) 

2.621  (0.276) 

0.0897 

G  (11*) 

0.02 

0.865  (0.006) 

0.00151  (0.00005) 

5.0 

0.0817 

H  (7*) 

0.05 

0.889  (0.007) 

0.00274(0.00011) 

4.5 

0.1025 

1(5) 

0.1 

0.885  (0.005) 

0.00253  (0.00009) 

3.449  (0.306) 

0.0689 

J(3) 

0.2 

0.882  (0.005) 

0.00244  (0.00007) 

4.019  (0.402) 

0.0655 

K  (9*) 

0.2 

0.872  (0.007) 

0.00233  (0.00007) 

5.0 

0.0843 

L(4) 

0.4 

0.889  (0.006) 

0.00278  (0.00011) 

5.473  (0.583) 

0.0855 

M  (6) 

0.8 

0.890  (0.005) 

0.00200  (0.00007) 

3.208  (0.263) 

0.0738 

N  (10) 

2 

0.887  (0.008) 

0.00169  (0.00009) 

2.544  (0.258) 

0.0984 

0(12) 

5 

0.878  (0.008) 

0.00145  (0.00007) 

2.206  (0.206) 

0.0837 

P  (14) 

10 

0.874  (0.006) 

0.00134(0.00006) 

1.971  (0.128) 

0.0599 

*  m  is  fixed  at  a  certain  value 


Table  2.  Estimated  interaction  index  and  its  95%  confidence  interval  at  each  dose  combination  for  the  Low 

FA  case 


Curve 

TMQ  dose 

AG2034 

dose 

Dose  Ratio 
(TMQ/AG) 

Dilution 

Predicted 

Effect 

II 

95%  Cl  for  II 

Lower  limit 

Upper  limit 

A  (8) 

B  (16) 

C(l) 

D  (2) 

E  (15) 

1.07E-07 

2.66E-05 

0.004 

1 

1 

0.87 

0.18 

4.29 

8.58E-07 

0.000213 

2 

0.9999 

0.73 

0.28 

1.89 

2.71E-06 

0.000673 

3 

0.9962 

0.67 

0.37 

1.20 

8.58E-06 

0.002129 

4 

0.864 

0.61 

0.48 

0.78 

1.72E-05 

0.004259 

5 

0.4454 

0.58 

0.52 

0.65 

3.43E-05 

0.008517 

6 

0.1802 

0.56 

0.45 

0.71 

6.86E-05 

0.017000 

7 

0.1368 

0.61 

0.24 

1.55 

0.000137 

0.034100 

8 

0.1319 

0.91 

0 

4.35E+03 

0.000434 

0.107700 

9 

0.1314 

2.71 

0 

1.30E+157 

0.001373 

0.340700 

10 

0.1314 

8.58 

0 

NA 

0.011000 

2.725500 

11 

0.1314 

68.6 

0 

NA 

F  (13) 

2.10E-07 

2.61E-05 

0.008 

1 

1 

0.28 

0.08 

0.91 

1.68E-06 

0.000209 

2 

0.9991 

0.35 

0.17 

0.71 

5.32E-06 

0.000660 

3 

0.9828 

0.4 

0.26 

0.62 

1.68E-05 

0.002088 

4 

0.746 

0.47 

0.4 

0.55 

3.37E-05 

0.004177 

5 

0.3788 

0.52 

0.46 

0.58 

6.73E-05 

0.008353 

6 

0.188 

0.59 

0.47 

0.74 

0.000135 

0.016700 

7 

0.1454 

0.76 

0.4 

1.47 

0.000269 

0.033400 

8 

0.138 

1.26 

0.02 

67.11 

0.000851 

0.105700 

9 

0.1366 

3.79 

0 

1.72E+37 

0.002692 

0.334100 

10 

0.1366 

11.95 

0 

NA 

0.021500 

2.673100 

11 

0.1366 

95.63 

0 

NA 

G  (11) 

4.97E-07 

2.47E-05 

0.02 

1 

1.0000 

5.60 

1.90 

16.45 

3.98E-06 

0.000197 

2 

1.0000 

1.09 

0.60 

1.96 

1.26E-05 

0.000624 

3 

0.9885 

0.47 

0.36 

0.63 

3.98E-05 

0.001974 

4 

0.2987 

0.22 

0.19 

0.25 

7.95E-05 

0.003949 

5 

0.1410 

0.17 

0.09 

0.32 

0.000159 

0.007898 

6 

0.1350 

0.27 

0.00 

3.09E+04 

0.000318 

0.015800 

7 

0.1348 

0.54 

0.00 

1.62E+161 

0.000636 

0.031600 

8 

0.1348 

1.09 

0.00 

NA 

0.002012 

0.099900 

9 

0.1348 

3.44 

0.00 

NA 

0.006364 

0.315900 

10 

0.1348 

10.87 

0.00 

NA 

0.050900 

2.527300 

11 

0.1348 

86.95 

0.00 

NA 

H(7) 

1.09E-06 

2.17E-05 

0.05 

1 

1.0000 

12.11 

3.53 

41.51 

8.75E-06 

0.000174 

2 

1.0000 

2.71 

1.42 

5.15 

2.77E-05 

0.000549 

3 

0.9992 

1.29 

0.88 

1.90 

8.75E-05 

0.001738 

4 

0.8773 

0.65 

0.57 

0.74 

0.000175 

0.003475 

5 

0.3035 

0.43 

0.38 

0.48 

>=0.000350 

>=0.006950 

6-  11 

<=0.1219 

NA 

NA 

NA 

1(5) 

1.82E-06 

1.81E-05 

0.1 

1 

1.0000 

2.39 

0.70 

8.17 

1.46E-05 

0.000145 

2 

0.9999 

1.17 

0.59 

2.30 

4.61E-05 

0.000458 

3 

0.9966 

0.83 

0.55 

1.24 

0.000146 

0.001448 

4 

0.8509 

0.61 

0.52 

0.71 

0.000292 

0.002896 

5 

0.3906 

0.51 

0.47 

0.55 

0.000583 

0.005792 

6 

0.1506 

0.38 

0.31 

0.47 

>=0.001167 

>=0.011600 

7-11 

<=0.1188 

NA 

NA 

NA 

J(3) 

2.73E-06 

1.36E-05 

0.2 

1 

1.0000 

12.56 

2.59 

60.87 

2.19E-05 

0.000109 

2 

1.0000 

3.31 

1.38 

7.95 

6.92E-05 

0.000343 

3 

0.9993 

1.67 

0.99 

2.81 

0.000219 

0.001086 

4 

0.9344 

0.88 

0.71 

1.08 

0.000438 

0.002172 

5 

0.5008 

0.61 

0.57 

0.65 

0.000875 

0.004344 

6 

0.1577 

0.40 

0.33 

0.48 

>=0.001750 

>=0.008688 

7-11 

<=0.1204 

NA 

NA 

NA 

K(9) 

2.73E-06 

1.36E-05 

0.2 

1 

1.0000 

88.77 

14.80 

532.45 

2.19E-05 

0.000109 

2 

1.0000 

9.70 

3.84 

24.48 

6.92E-05 

0.000343 

3 

0.9998 

3.04 

1.83 

5.06 

0.000219 

0.001086 

4 

0.9550 

1.02 

0.86 

1.21 

0.000438 

0.002172 

5 

0.4457 

0.55 

0.51 

0.60 

0.000875 

0.004344 

6 

0.1429 

0.32 

0.21 

0.47 

>=0.001750 

>=0.008688 

7-11 

<=0.1280 

NA 

NA 

NA 

L  (4) 

3.65E-06 

9.05E-06 

0.4 

1 

1.0000 

812.88 

82.77 

7.98E+03 

2.92E-05 

7.24E-05 

2 

1.0000 

53.71 

13.70 

210.53 

9.22E-05 

0.000229 

3 

1.0000 

12.38 

5.18 

29.56 

0.000292 

0.000724 

4 

0.9964 

2.99 

1.98 

4.51 

0.000583 

0.001448 

5 

0.8651 

1.31 

1.10 

1.56 

0.001167 

0.002896 

6 

0.2103 

0.56 

0.50 

0.63 

>=0.002333 

>=0.005792 

7-11 

<=0.1134 

NA 

NA 

NA 

M  (6) 

4.38E-06 

5.43E-06 

0.8 

1 

1.0000 

4.95 

1.18 

20.69 

3.50E-05 

4.34E-05 

2 

1.0000 

2.40 

1.01 

5.71 

0.000111 

0.000137 

3 

0.9989 

1.63 

0.93 

2.87 

0.000350 

0.000434 

4 

0.9580 

1.12 

0.85 

1.48 

0.000700 

0.000869 

5 

0.7206 

0.90 

0.80 

1.03 

0.001400 

0.001738 

6 

0.2804 

0.72 

0.64 

0.80 

0.002800 

0.003475 

7 

0.1325 

0.41 

0.25 

0.67 

>=0.005600 

>=0.006950 

8-  11 

<=0.1128 

NA 

NA 

NA 

N  (10) 

4.97E-06 

2.47E-06 

2 

1 

1.0000 

1.37 

0.32 

5.84 

3.98E-05 

1.97E-05 

2 

0.9998 

1.17 

0.47 

2.89 

0.000126 

0.000062 

3 

0.9967 

1.08 

0.59 

1.98 

0.000398 

0.000197 

4 

0.9417 

0.99 

0.72 

1.37 

0.000795 

0.000395 

0.001591 

0.000790 

0.003182 

0.001580 

>=0.006364 

>=0.003159 

5.26E-06 

1.04E-06 

4.21E-05 

8.35E-06 

0.000133 

2.64E-05 

0.000421 

8.35E-05 

0.000841 

0.000167 

0.001683 

0.000334 

0.003365 

0.000668 

0.006731 

0.001337 

>=0.021300 

>=0.004227 

5.36E-06 

5.32E-07 

4.29E-05 

4.26E-06 

0.000136 

1.35E-05 

0.000429 

4.26E-05 

0.000858 

8.52E-05 

0.001716 

0.000170 

0.003431 

0.000341 

0.006863 

0.000681 

>=0.021700 

>=0.002155 

0.7418 
0.3742 
0.1721 
<=  0.1236 

1.0000 
0.9995 
0.9934 
0.9227 
0.7294 
0.4094 
0.2060 
0.1420 
<=  0.1239 

1.0000 
0.9988 
0.9887 
0.9015 
0.7095 
0.4221 
0.2272 
0.1544 
<=  0.1292 


Table  3.  Summary  of  parameter  estimates  (Standard  Error)  for  the  High  FA  case 


Curve 

Dose  ratio 
(TMQ/AG) 

Emax 

ED50 

m 

Residual  sum  of 
squares 

A  (8) 

B  (16) 

C(l) 

0.883  (0.012) 

0.0137  (0.0012) 

3.625  (0.650) 

0.1074 

D(2) 

0.831  (0.015) 

0.5224  (0.0439) 

1.468(0.137) 

0.0770 

E  (15) 

0.0004 

0.867  (0.014) 

0.1943  (0.0122) 

2.558  (0.405) 

0.1134 

F  (13) 

0.0008 

0.863  (0.010) 

0.1447  (0.0068) 

2.643  (0.258) 

0.0852 

G  (11) 

0.002 

0.859  (0.010) 

0.0912  (0.0045) 

2.996  (0.355) 

0.0999 

H(7) 

0.005 

0.881  (0.006) 

0.0699  (0.0027) 

2.887  (0.253) 

0.0746 

1(5) 

0.01 

0.881  (0.009) 

0.0484  (0.0026) 

2.528  (0.251) 

0.0977 

J(3) 

0.02 

0.884  (0.006) 

0.0331  (0.0011) 

2.114(0.136) 

0.0615 

K(9) 

0.02 

0.885  (0.008) 

0.0369  (0.0019) 

2.160(0.195) 

0.0861 

L(4) 

0.04 

0.886  (0.008) 

0.0288  (0.0014) 

2.504  (0.255) 

0.0959 

M  (6) 

0.08 

0.885  (0.009) 

0.0197  (0.0010) 

2.242  (0.214) 

0.0881 

N  (10) 

0.2 

0.862  (0.010) 

0.0154  (0.0007) 

3.309  (0.415) 

0.0909 

0(12) 

0.5 

0.878  (0.009) 

0.0139  (0.0006) 

3.491  (0.405) 

0.0933 

P  (14) 

1 

0.893  (0.008) 

0.0183  (0.0009) 

2.735  (0.213) 

0.0669 

Table  4.  Estimated  interaction  index  and  its  95%  confidence  interval  at  each  dose  combination  for  the  High 

FA  case 


Curve 

TMQ 

dose 

AG2034 

dose 

Dose  Ratio 
(TMQ/AG) 

Dilution 

Predicted 

Effect 

II 

95%  Cl  for  II 

Lower  limit 

Upper  limit 

A  (8) 

B  (16) 

C(l) 

D  (2) 

E  (15) 

1.07E-07 

0.000266 

0.0004 

1 

1.0000 

48.28 

2.53 

922.71 

8.58E-07 

0.002128 

2 

1.0000 

10.31 

1.36 

78.32 

2.71E-06 

0.006729 

3 

0.9998 

4.39 

0.96 

20.02 

8.58E-06 

0.021278 

4 

0.9970 

1.87 

0.68 

5.14 

1.72E-05 

0.042555 

5 

0.9825 

1.12 

0.55 

2.28 

3.43E-05 

0.085110 

6 

0.9063 

0.67 

0.44 

1.02 

6.86E-05 

0.170221 

7 

0.6388 

0.40 

0.32 

0.48 

0.000137 

0.340441 

8 

0.2994 

0.21 

0.16 

0.28 

>=0.000434 

>=1.076570 

9-  11 

<=0.1433 

NA 

NA 

NA 

F  (13) 

2.10E-07 

0.000261 

0.0008 

1 

1.0000 

42.38 

3.74 

479.71 

1.68E-06 

0.002087 

2 

1.0000 

8.02 

1.56 

41.25 

5.32E-06 

0.006599 

3 

0.9998 

3.20 

0.96 

10.63 

1.68E-05 

0.020868 

4 

0.9949 

1.27 

0.59 

2.75 

3.37E-05 

0.041737 

5 

0.9688 

0.73 

0.44 

1.23 

6.73E-05 

0.083474 

6 

0.8363 

0.42 

0.32 

0.56 

0.000135 

0.166947 

7 

0.4876 

0.24 

0.20 

0.28 

0.000269 

0.333894 

8 

0.2224 

0.11 

0.08 

0.16 

>=0.000851 

>=1.055866 

9-  11 

<=0.1418 

NA 

NA 

NA 

G  (11) 

4.97E-07 

0.000247 

0.002 

1 

1.0000 

80.07 

5.84 

1097.54 

3.98E-06 

0.001973 

2 

1.0000 

9.20 

1.67 

50.66 

1.26E-05 

0.006239 

3 

0.9997 

2.78 

0.84 

9.26 

3.98E-05 

0.019730 

4 

0.9913 

0.85 

0.42 

1.71 

7.95E-05 

0.039460 

5 

0.9351 

0.42 

0.28 

0.63 

0.000159 

0.078920 

6 

0.6609 

0.21 

0.17 

0.25 

0.000318 

0.157841 

7 

0.2796 

0.10 

0.08 

0.12 

>=0.000636 

>=0.315682 

8-  11 

<=0.1614 

NA 

NA 

NA 

H(7) 

1.09E-06 

0.000217 

0.005 

1 

1.0000 

33.80 

3.34 

342.20 

8.75E-06 

0.001736 

2 

1.0000 

4.54 

1.03 

20.08 

2.77E-05 

0.005491 

3 

0.9994 

1.50 

0.54 

4.20 

8.75E-05 

0.017363 

4 

0.9843 

0.51 

0.29 

0.90 

0.000175 

0.034725 

5 

0.8955 

0.27 

0.20 

0.37 

0.000350 

0.069450 

6 

0.5603 

0.15 

0.13 

0.17 

0.000700 

0.138900 

7 

0.2239 

0.07 

0.06 

0.09 

>=0.001400 

>=0.277800 

8-  11 

<=0.1344 

NA 

NA 

NA 

1(5) 

1.82E-06 

0.000181 

0.01 

1 

1.0000 

4.94 

0.61 

39.76 

1.46E-05 

0.001447 

2 

0.9999 

1.11 

0.30 

4.13 

4.61E-05 

0.004575 

3 

0.9977 

0.50 

0.21 

1.20 

0.000146 

0.014469 

4 

0.9592 

0.23 

0.15 

0.37 

0.000292 

0.028938 

5 

0.8071 

0.16 

0.12 

0.20 

0.000583 

0.057875 

6 

0.4556 

0.11 

0.09 

0.13 

0.001167 

0.115750 

7 

0.2041 

0.07 

0.06 

0.09 

>=0.002333 

>=0.231500 

8  -  11 

<=0.1347 

NA 

NA 

NA 

J(3) 

2.73E-06 

0.000136 

0.02 

1 

1.0000 

0.67 

0.13 

3.34 

2.19E-05 

0.001085 

2 

0.9993 

0.28 

0.10 

0.74 

6.92E-05 

0.003432 

3 

0.9924 

0.18 

0.09 

0.34 

0.000219 

0.010852 

4 

0.9206 

0.13 

0.09 

0.17 

0.000438 

0.021703 

5 

0.7354 

0.11 

0.09 

0.13 

0.000875 

0.043406 

6 

0.4261 

0.10 

0.09 

0.12 

0.001750 

0.086813 

7 

0.2139 

0.10 

0.08 

0.11 

>=0.003500 

>=0.173625 

8-  11 

<=0.1405 

NA 

NA 

NA 

K(9) 

2.73E-06 

0.000136 

0.02 

1 

1.0000 

0.93 

0.15 

5.72 

2.19E-05 

0.001085 

2 

0.9995 

0.36 

0.12 

1.12 

6.92E-05 

0.003432 

3 

0.9946 

0.22 

0.11 

0.47 

0.000219 

0.010852 

4 

0.9390 

0.15 

0.10 

0.22 

0.000438 

0.021703 

5 

0.7800 

0.13 

0.10 

0.16 

0.000875 

0.043406 

6 

0.4722 

0.11 

0.10 

0.13 

0.001750 

0.086813 

7 

0.2316 

0.11 

0.09 

0.13 

>=0.003500 

>=0.173625 

8-  11 

<=0.1443 

NA 

NA 

NA 

L  (4) 

3.65E-06 

0.000090 

0.04 

1 

1.0000 

2.89 

0.34 

24.37 

2.92E-05 

0.000723 

2 

0.9999 

0.69 

0.18 

2.62 

9.22E-05 

0.002288 

3 

0.9983 

0.33 

0.14 

0.80 

0.000292 

0.007234 

4 

0.9702 

0.18 

0.12 

0.29 

0.000583 

0.014469 

5 

0.8538 

0.15 

0.11 

0.19 

0.001167 

0.028938 

6 

0.5312 

0.13 

0.11 

0.15 

0.002333 

0.057875 

7 

0.2326 

0.12 

0.10 

0.15 

>=0.004667 

>=0.115750 

8-  11 

<=0.1355 

NA 

NA 

NA 

M  (6) 

4.38E-06 

5.43E-05 

0.08 

1 

1.0000 

0.73 

0.10 

5.17 

3.50E-05 

0.000434 

2 

0.9998 

0.27 

0.08 

0.89 

0.000111 

0.001373 

3 

0.9973 

0.17 

0.08 

0.37 

0.000350 

0.004341 

4 

0.9660 

0.13 

0.09 

0.21 

0.000700 

0.008681 

5 

0.8594 

0.13 

0.10 

0.17 

0.001400 

0.017363 

6 

0.5823 

0.14 

0.12 

0.16 

0.002800 

0.034725 

7 

0.2842 

0.16 

0.13 

0.18 

>=0.005600 

>=0.069450 

8-  11 

<=0.1571 

NA 

NA 

NA 

N  (10) 

4.97E-06 

2.47E-05 

0.2 

1 

1.0000 

61.54 

3.00 

1262.99 

3.98E-05 

0.000197 

2 

1.0000 

4.66 

0.64 

34.02 

0.000126 

0.000624 

3 

1.0000 

1.21 

0.31 

4.73 

0.000398 

0.001973 

4 

0.9983 

0.41 

0.20 

0.87 

0.000795 

0.003946 

5 

0.9830 

0.28 

0.17 

0.46 

0.001591 

0.007892 

6 

0.8570 

0.23 

0.17 

0.31 

0.003182 

0.015784 

7 

0.4280 

0.21 

0.18 

0.25 

0.006364 

0.031568 

8 

0.1800 

0.23 

0.18 

0.30 

>=0.020124 

>=0.099827 

9-  11 

<=0.1390 

NA 

NA 

NA 

0(12) 

5.26E-06 

1.04E-05 

0.5 

1 

1.0000 

194.98 

6.90 

5509.68 

4.21E-05 

8.35E-05 

2 

1.0000 

11.38 

1.16 

111.41 

0.000133 

0.000264 

3 

1.0000 

2.57 

0.51 

12.90 

0.000421 

0.000835 

4 

0.9998 

0.78 

0.30 

1.97 

0.000841 

0.001669 

5 

0.9978 

0.50 

0.25 

0.98 

0.001683 

0.003339 

6 

0.9754 

0.40 

0.24 

0.64 

0.003365 

0.006678 

7 

0.7849 

0.36 

0.27 

0.48 

0.006731 

0.013356 

8 

0.3109 

0.35 

0.30 

0.41 

>=0.021285 

>=0.042235 

9-  11 

<=0.1263 

NA 

NA 

NA 

P  (14) 

5.36E-06 

5.32E-06 

1 

1 

1.0000 

10.38 

0.65 

165.28 

4.29E-05 

4.26E-05 

2 

1.0000 

1.89 

0.29 

12.19 

0.000136 

0.000135 

3 

1.0000 

0.87 

0.24 

3.15 

0.000429 

0.000426 

4 

0.9998 

0.55 

0.23 

1.29 

0.000858 

0.000851 

5 

0.9986 

0.50 

0.25 

1.03 

0.001716 

0.001702 

6 

0.9910 

0.51 

0.29 

0.92 

0.003431 

0.003404 

7 

0.9436 

0.56 

0.37 

0.86 

0.006863 

0.006809 

8 

0.7232 

0.64 

0.50 

0.83 

0.021702 

0.021531 

9 

0.1851 

0.80 

0.65 

0.99 

>=0.068627 

>=0.068088 

10,  11 

<=0.1109 

NA 

NA 

NA 
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Abstract 


Background 

Response-adaptive  randomizations  are  able  to  assign  more  patients  in  a  comparative  clinical  trial 
to  the  tentatively  better  treatment.  However,  due  to  the  adaptation  in  patient  allocation,  the  samples 
to  be  compared  are  no  longer  independent.  At  large  sample  sizes,  many  asymptotic  properties  of 
test  statistics  derived  for  independent  sample  comparison  are  still  applicable  in  adaptive 
randomization  provided  that  the  patient  allocation  ratio  converges  asymptotically.  However,  the 
small  sample  properties  of  commonly  used  test  statistics  in  response-adaptive  randomization  are 
not  fully  studied. 

Methods 

Simulations  are  systematically  conducted  to  characterize  the  statistical  properties  of  8  test 
statistics  in  6  response-adaptive  randomization  methods  at  6  allocation  targets  with  sample  sizes 
ranging  from  20  to  200.  Since  adaptive  randomization  is  usually  not  recommended  for  sample  size 
less  than  30,  the  present  paper  focuses  on  the  case  with  a  sample  of  30  to  give  general 
recommendations  with  regard  to  test  statistics  for  contingency  table  in  response-adaptive 
randomization  at  small  sample  sizes. 

Results 

Among  all  asymptotic  test  statistics,  the  Cook’s  correction  to  Chi-square  test  ( TMc )  is  the  best  in 
attaining  the  nominal  size  of  hypothesis  test.  The  William’s  correction  to  log-likelihood  ratio  test 
(T ML)  gives  slightly  inflated  type  I  error  and  higher  power  as  compared  with  TMc,  but  it  is  more 
robust  against  the  unbalance  in  patient  allocation.  TMc  and  TMl  are  usually  the  two  test  statistics 
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with  the  highest  power  in  different  simulation  scenarios.  When  focusing  on  TMc  and  TMl,  the 
generalized  drop-the-loser  urn  (GDL)  has  the  best  ability  to  attain  the  correct  size  of  hypothesis 
test.  Among  all  sequential  methods  that  can  target  different  allocation  ratios,  GDL  has  the  lowest 
variation  and  the  highest  overall  power  at  all  allocation  ratios.  The  performance  of  different 
adaptive  randomization  methods  and  test  statistics  also  depends  on  allocation  targets.  At  the 
limiting  allocation  ratio  of  drop-the-loser  (DL)  and  play-the-winner  (RPW)  urn,  DL  outperforms 
all  other  methods  including  GDL.  When  comparing  the  power  of  test  statistics  in  the  same 
randomization  method  but  at  different  allocation  targets,  the  powers  of  log- likelihood-ratio, 
log-relative-risk,  log-odds-ratio,  Wald-type  Z,  and  Chi-square  test  statistics  are  maximized  at  their 
corresponding  optimal  allocation  ratios  for  power.  Except  for  the  optimal  allocation  target  for 
log-relative-risk,  the  other  4  optimal  targets  could  assign  more  patients  to  the  worse  arm  in  some 
simulation  scenarios.  Another  optimal  allocation  target,  Rrsihr,  proposed  by  Rosenberger  and 
Sriram  (, Journal  of  Statistical  Planning  and  Inference,  1997)  is  aimed  at  minimizing  the  number  of 
failures  at  fixed  power  using  Wald-type  Z  test  statistics.  Among  allocation  ratios  that  always 
assign  more  patients  to  the  better  treatment,  Rrsihr  usually  has  less  variation  in  patient  allocation, 
and  value  of  variation  is  consistent  across  all  simulation  scenarios.  Additionally,  the  patient 
allocation  at  Rrsihr  is  not  that  extreme.  Therefore,  Rrsihr  provides  a  good  balance  between 
assigning  more  patients  to  the  better  treatment  and  maintaining  the  overall  power. 

Conclusions 

The  Cook’s  correction  to  Chi-square  test  and  Williams’  correction  to  log- likelihood-ratio  test  are 
generally  recommended  for  hypothesis  test  in  response-adaptive  randomization,  especially  when 
sample  sizes  are  small.  The  generalized  drop-the-loser  urn  design  is  the  recommended  method  for 
its  good  overall  properties.  Also  recommended  is  the  use  of  the  Rrsihr  allocation  target. 
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Background 


The  response-adaptive  randomization  (RAR)  in  clinical  trials  is  a  class  of  flexible  ways  of 
assigning  treatment  to  new  patients  sequentially  based  on  available  data.  The  RAR  adjusts  the 
allocation  probabilities  to  reflect  the  interim  results  of  the  trial,  thereby  allowing  patients  to  benefit 
from  the  interim  knowledge  as  it  accumulates  in  the  trial.  In  practice,  unequal  allocation 
probabilities  are  generated  based  on  the  current  assessment  of  treatment  efficacy,  which  results  in 
more  patients  being  assigned  to  the  treatment  that  is  putatively  superior. 

Many  RAR  designs  have  been  proposed  over  the  years  [1-13].  The  two  key  issues  extensively 
investigated  are  the  evaluations  of  parameter  estimations  and  hypothesis  testing.  Due  to  the 
dependency  of  assigning  new  patients  based  on  observed  data  at  that  time,  conventional  estimates 
of  a  treatment  effect  are  often  biased;  therefore,  efforts  have  been  made  to  quantify  and  correct 
estimation  bias  [14,  15].  Recent  theoretical  works  have  been  focused  on  solving  problems 
encountered  in  practice,  which  includes  delayed  response,  implementation  for  multi-arm  trials, 
and  incorporating  covariates  [1,3,  11,  16-18],  Many  recent  theoretical  developments  are 
summarized  in  [19].  Additionally,  in  order  to  compare  treatment  efficacies  through  hypothesis 
testing,  studies  have  been  conducted  on  power  comparisons  and  sample  size  calculations  under  the 
framework  of  adaptive  randomization  [20-24],  However,  most  of  the  works  focusing  on 
asymptotic  properties  are  based  on  large  sample  sizes  [4, 12, 22, 25, 26],  thus  these  properties  have 
not  been  fully  studied  with  small  sample  sizes.  The  mathematical  challenge  imposed  by  the 
correlation  of  data  makes  it  extremely  difficult  to  derive  exact  solutions  for  finite  samples.  Up  to 
now,  only  limited  results  on  exact  solutions  have  been  available  [15,  27],  and  computer  simulation 
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has  to  be  relied  upon  when  sample  size  is  small  [23,  24],  which  is  often  the  case  in  early  phase  II 
trials. 

Each  RAR  design  has  its  own  objective,  and  there  are  both  advantages  and  disadvantages 
associated  with  that  objective.  It  is  not  our  purpose  to  give  a  comprehensive  assessment  of 
different  designs  by  comparing  their  advantages  and  disadvantages.  Instead,  the  primary  objective 
of  the  present  study  is  to  characterize  the  small  sample  properties  of  RAR  based  on  a  frequentist 
approach.  In  particular,  we  focus  on  comparing  the  performance  of  commonly  used  test  statistics 
in  RAR  of  two-arm  comparative  trials  with  a  binary  outcome.  Due  to  the  departure  from  normality 
caused  by  data  correlation  and  the  discrete  nature  of  a  binary  outcome,  hypothesis  tests  usually  can 
not  be  controlled  at  the  level  of  nominal  significance.  Thus,  to  make  our  simulation  comparison 
more  relevant,  our  assessment  of  hypothesis  testing  methods  and  RAR  procedures  is  based  on  the 
calculation  of  both  statistical  power  and  the  comparison  to  the  nominal  type  I  error  rate.  Several 
RAR  methods  studied  in  our  simulations  can  assign  patients  according  to  a  given  allocation  target, 
which  may  be  optimal  in  terms  of  maximizing  the  power  or  minimizing  the  expected  treatment 
failure.  Therefore,  we  also  compare  the  properties  of  test  statistics  at  different  optimal  allocation 
targets. 

The  remaining  parts  of  this  paper  are  organized  into  4  sections.  In  the  Method  Section,  we 
introduce  the  adaptive  randomization  procedures,  the  optimal  allocation  rates,  and  the  test 
statistics  used  in  the  simulation.  In  the  Results  Section,  we  present  the  simulation  results.  We 
provide  a  discussion  and  final  recommendations  regarding  the  RAR  methods  and  hypothesis  tests 
in  the  Discussions  and  Conclusion  Sections. 
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Methods 


In  the  present  section,  we  briefly  describe  the  randomization  methods,  asymptotic  hypothesis  test 
statistics,  and  optimal  patient  allocation  targets  that  are  relevant  to  our  simulations.  More  detailed 
information  can  be  found  in  the  corresponding  references. 

Response-based  Adaptive  Randomization  (RAR) 

The  RAR  procedures  investigated  in  the  present  study  are  randomized  play-the -winner  um 
(RPW)  [8]  [10],  and  drop-the-loser  um  (DL)  [28],  sequential  maximum  likelihood  estimation 
design  (SMLE)  [12],  doubly-adaptive  biased  coin  design  (DBCD)  [2,  3],  sequential 
estimation-adjusted  um  designs  (SEU)  [13],  and  generalized  drop-the-loser  um  (GDL)  [11], 

RPW,  DL,  SEU  and  GDL  are  all  um  models  in  the  sense  that  the  treatment  for  each  patient  is 
selected  by  sampling  balls  from  an  um.  In  the  usual  clinical  trial  setting,  an  um  model  consists  of 
one  um  with  different  types  of  balls  that  represent  the  different  treatments  under  study.  Patients 
are  assigned  to  treatments  by  randomly  selecting  balls  from  the  urn.  Initially,  the  um  contains  an 
equal  number  of  balls  for  each  of  the  treatment  offered  in  the  trial.  With  the  progress  of  a  clinical 
trial,  certain  rules  are  applied  to  update  the  contents  of  the  um  in  such  a  way  that  favors  the 
selection  of  balls  corresponding  to  the  better  treatment.  For  example,  under  the  RPW  design,  the 
observation  of  a  successful  treatment  response  leads  to  the  addition  of  a  (>0)  balls  of  the  same 
type  to  the  um;  a  lack  of  success  leads  to  the  addition  of  b  (>0)  balls  of  the  other  type  to  the  um 
(a=b=  1  in  our  simulation).  The  limiting  allocation  rate  of  patients  on  treatment  1  is  qiKq&qi)-, 
where  q\=l-p\  and  c/2=  1  -pi  are  failure  rates,  and p\  and p-i  are  success  rates  (or  response  rates)  for 
treatments  1  and  2.  In  the  DL  model,  patients  are  assigned  to  a  treatment  based  on  the  type  of 
ball  that  is  drawn;  however  a  treatment  failure  results  in  the  removal  of  a  treatment  ball  from  the 
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urn,  and  treatment  successes  are  ignored.  Due  to  the  finite  probabilities  of  extinction, 
immigration  balls  are  added  to  the  urn.  If  an  immigration  ball  is  drawn,  an  additional  ball  of  each 
type  is  added.  The  sampling  process  is  repeated  until  a  treatment  ball  is  drawn.  The  DL  urn 
design  has  the  same  limiting  allocation  as  the  RPW  urn,  but  less  variability  in  patient  allocation. 
Both  SEU  and  GDL  are  urn  models  allowing  fraction  number  of  balls,  and  can  target  any 
allocation  rate.  For  SEU  method  [13],  if  the  limiting  allocation  of  RPW  urn  is  the  target  in  a 

two-arm  trial,  then  qx  (0  /  G,  O')  +  qx  0))  balls  of  type  2  and  q2  (0  /  0),  0)  +  <7,  (0)  balls  of  type  1 
are  added  to  the  urn  following  the  allocation  of  the  ith  patient.  Obviously,  the  response  status  of 
the  z'th  patient  is  related  to  the  contents  of  SEU  urn  only  through  the  calculation  of  qx  O')  and 
q2 O')  •  For  a  two-arm  GDL  urn  model  [11],  when  a  treatment  ball  is  drawn,  a  new  patient  is 
assigned  accordingly,  but  the  ball  will  not  be  returned  to  the  urn.  Depending  on  the  response  of 
the  patient,  the  conditional  average  numbers  of  balls  being  added  back  to  the  urn  are  b\  and  b2, 
respectively,  for  treatments  1  and  2.  Therefore,  the  conditional  average  numbers  of  type  1  and 
type  2  balls  being  taken  out  of  the  urn  can  be  defined  as  d\  and  d2,  where  d\=l-b\  and  d2=\-b2. 
Immigration  balls  are  also  present  in  a  GDL  urn.  Whenever  an  immigration  ball  is  drawn,  a\  and 
a2  balls,  respectively,  are  added  for  treatments  1  and  2.  Zhang  et  al  [11]  have  shown  that  the 
limiting  allocation  rate  of  patients  on  treatment  1  is 

ai 

_  (1) 

n  ax  a2 

— —  H - — 

dx  d2 

The  GDL  urn  becomes  a  DL  urn  when  a\=l,  <22=1,  b\=p\ ,  and  b2=pi-  Although  GDL  is  a  general 
method  with  different  ways  of  implementation,  a  convenient  approach  is  taken  in  our  simulation. 
When  a  treatment  ball  is  drawn,  the  ball  is  not  returned,  and  no  ball  is  added  regardless  of  the 
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response  of  the  patient.  When  an  immigration  ball  is  drawn,  Cp\  and  Cfn  balls  of  type  1  and  2  are 
added,  where  C  is  a  constant,  and  p\  and  pi  are  allocation  targets  on  treatments  1  and  2,  which 
are  estimated  sequentially  using  the  maximum  likelihood  estimates  (MLE)  [11]. 

The  SMLE  and  DBCD  methods  can  also  target  any  allocation  ratios,  and  SMLE  can  be 
implemented  as  a  special  case  of  DBCD  method.  In  DBCD  method,  the  probability  of  the  (7+l)th 
patient  being  assigned  to  treatment  1  is  calculated  by 


Pr  [The  (/+l)th  patient  is  assigned  to  treatment  l]  =  g 


n\  d ) 


V  i 


,A(0 


(2) 


where  «,(/)  /  i  and  /?,(/)  are  the  current  allocation  rate  and  estimated  allocation  rate  on 

treatment  1  [2,  3].  The  properties  of  the  DBCD  depend  largely  on  the  selection  of  g,  which  can 
be  considered  as  a  measuring  function  for  the  deviation  from  the  allocation  target.  In  the  present 
study,  we  use  the  following  function  suggested  by  Hu  and  Zhang  [3]: 


g(r,  p) 


p(plr)r 

p{plr)r+(\-p)((\-p)l(\-r))r 


g(0,/7)=l 
g(l,  p)  =  0 


(3) 


where  y  is  a  tuning  parameter.  When  y  approaches  infinity,  the  DBCD  becomes  deterministic  and 
the  patients  are  assigned  to  the  putatively  better  treatment  with  probability  1 .  When  y  is  equal  to 
0,  the  MLE  of  p  becomes  the  allocation  target,  and  the  DBCD  method  is  essential  the  same  as  the 
SMLE  design  proposed  by  Melfi  et  al  [12]. 


Hypothesis  Tests  for  Two-Arm  Comparative  Trials 

In  two-arm  comparative  trials,  the  results  of  a  binary  outcome  variable  can  be  summarized  by  a 
2x2  contingency  table  (Table  1).  The  following  hypothesis  test  is  often  conducted  to  compare 


treatment  efficacies: 


Ho-  P\=  Pi 
H\  -  Pi  *  Pi 


(4) 


Nine  test  statistics  for  the  hypothesis  test  in  (4)  are  given  in  Table  2.  When  relative  risk  (qi/qi) 
and  odds  ratio  (piqi/qipi)  are  used  to  quantify  the  differences  between  2  treatment  arms,  the  test 
statistics  are  log-relative-risk  and  log-odds-ratio,  TRisk  and  Todds,  which  are  asymptotically 
distributed  as  Chi-squared  distribution  with  one  degree  of  freedom  ( ).  When  simple  difference 

is  used  to  measure  the  treatment  effect,  the  applicable  test  statistics  are  the  Wald-type  test 
statistic  Twdd  and  the  score-type  test  statistics  TChisq ,  where  the  variance  of  simple  difference  in 
response  rates  is  evaluated  at  H\  or  Ho  respectively.  Additionally,  the  test  statistics  based  on  the 
log  of  likelihood  ratio  (TLLR)  can  also  be  constructed.  Besides  the  5  commonly  used  test  statistics 
mentioned  above,  4  modified  test  statistics  are  also  included  in  Table  2.  TMo  is  a  modified 
log-odds-ratio  test  proposed  by  Gart  using  the  approximation  of  discrete  distributions  by  their 
continuous  analogues  [29].  As  shown  in  Table  2,  TMo  is  essentially  a  modification  to  Todds  by 
adding  0.5  to  each  cell  of  a  2x2  table.  Similarly,  Agresti  and  Caffo  proposed  a  modification  to 
Twdd  by  adding  1  to  each  cell  of  a  contingency  table  [30],  which  results  in  the  test  statistic  TMw  in 
Table  2.  TMc  is  the  Cook’s  continuity  correction  to  Chi-square  test  statistics  TChisq •  Willimas 
provided  a  modification  to  log-likelihood-ratio  test  TLLR  [31].  The  original  test  statistic  TLLR  is 
improved  by  multiplying  a  scale  factor  such  that  the  null  distribution  of  the  new  test  statistic  TMl 
has  the  same  moments  as  the  Chi-square  distribution. 

Since  all  test  statistics  in  Table  2  are  based  on  ,  they  are  asymptotically  equivalent  and  any  one 

of  them  can  be  used  for  large  sample  sizes.  Meanwhile  at  small  sample  sizes,  an  exact  test  can  be 
conducted  if  a  model  is  specified  for  the  data  given  in  Table  1.  For  example,  depending  on  the 
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number  of  fixed  margins  predetermined  for  the  design,  one  of  the  following  3  models  can  be 
applied  [32]: 

?r(rl\n,nl,r)  =  h(rl\n,nl,r),  (5) 

Pr  (rx  ,r\n,nl,p)  =  h(rl\n,nl,  r)b(r  \n,p ),  (6) 

and 

Pr(>i  ,r,nl\n,p,p)  =  h(rl\n,n„  r)b(r  \  n,  p)b(nx  \n,p),  (7) 

where  h(rx  \  n,nx,  r )  represents  the  hypergeometric  distribution  of  r\,  b(r  \  n,  p )  gives  the 

binomial  distribution  of  r  under  the  null  hypothesis  of  equal  response  rates  (Ho:  p\  =pi  =p),  and 
b(nx  |  n,p )  denotes  the  binomial  distributions  of  patients  on  arm  1  with  an  allocation  ratio  of  p  (p 

=  0.5  for  equal  randomization).  The  p-value  of  exact  test  can  be  calculated  by  maximizing  the 
probability  in  (5),  (6),  or  (7)  over  the  2  nuisance  parameters,  p  and  p.  However,  due  to  data 
dependency,  none  of  the  above  3  models  are  applicable  in  adaptive  randomization.  For  example, 
the  allocation  ratio  p  in  adaptive  randomization  is  a  random  variable  with  unknown  distribution, 
and  the  binomial  distribution  of  n\  assumed  in  model  (7)  is  not  valid  even  when  the  null 
hypothesis  is  true.  Therefore,  unconditional  exact  tests  are  not  available  in  adaptive 
randomization,  and  asymptotic  test  statistics  such  as  the  ones  in  Table  2  are  needed  to  test  the 
hypothesis  in  (4)  for  adaptive  randomization. 

Optimal  Allocation  Ratios 

The  SMLE,  DBCD,  SEU,  and  GDL  methods  can  be  utilized  to  allocate  patients  based  on 
different  allocation  targets.  The  allocation  targets  simulated  in  the  present  study  are  summarized 
in  Table  3,  where  Rrisu,  Rodds,  Rwaid,  Rcmsq,  and  Rllr  are  optimal  allocation  ratios  maximizing  the 
power  of  TRisk,  Todds,  TWaid,  Tchisq,  and  TLLR  respectively,  at  fixed  sample  size.  The  derivation  of 
Tgisk,  Todds,  Twaid,  Tchisq,  and  Tllr  can  be  found  in  [33,  34],  The  method  used  is  equivalent  to 
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minimizing  the  variance  of  corresponding  test  statistic  at  a  fixed  total  sample  size,  and 
consequently  the  power  of  that  test  statistic  is  maximized.  Rrsihr  is  a  recently  proposed 
allocation  target  that  is  optimal  in  minimizing  the  expected  total  number  of  failures  among  all 
trials  with  the  same  power  [15,  33].  The  general  theoretical  framework  and  the  practical 
implementation  of  optimal  allocation  in  k-arm  trials  with  binary  outcomes  are  discussed  and 
demonstrated  by  Tymofyeyev  et  al  [35],  where  the  optimization  can  be  conducted  over  different 
goals.  In  practice,  the  performance  of  the  methodology  depends  on  the  chosen  RAR  procedure. 
The  present  simulation  study  only  focuses  on  two-arm  trials,  where  straightforward 
implementation  can  be  achieved  for  maximizing  the  power  or  minimizing  the  total  number  of 
failures. 

Results 

Simulations  are  conducted  at  different  total  numbers  of  patients  ranging  from  20  to  200.  To 
simplify  the  presentation,  the  results  for  trials  with  30  patients  are  shown  here.  When  patients  are 
less  than  30,  adaptive  randomization  is  generally  not  recommended.  For  sample  size  of  100  or 
larger,  all  methods  yield  similar  properties  in  general.  For  all  of  the  urn  models,  one  ball  for  each 
treatment  is  consistently  used  as  the  initial  contents  of  the  urn.  The  number  of  immigration  balls 
is  1  for  both  the  DL  and  GDL  urns.  The  tuning  parameter  of  DBCD,  y,  is  fixed  at  0  or  2.  When  y 
is  0,  it  results  in  the  SMLE  method.  The  value  of  the  constant  C  in  GDL  is  2,  which  is  equivalent 
to  adding  2  treatment  balls  on  average  when  an  immigration  ball  is  drawn.  All  simulation  results 
are  calculated  based  on  10,000  simulation  runs. 

The  simulation  results  for  allocation  rates  on  arm  1  are  shown  in  Table  5.  For  the  purpose  of 
comparison,  the  true  allocation  rates  are  shown  in  Table  4.  Among  all  RAR  methods,  DBCD  has 


li 


the  best  ability  to  attain  the  true  allocation  target.  The  comparison  between  SMLE  and  DBCD 
shows  that,  the  allocation  becomes  more  unbalanced  and  the  variation  of  DBCD  decreases  with 
increasing  value  of  y.  On  the  other  hand,  the  patient  allocation  of  SEU  results  in  more  balanced 
allocation  between  two  arms  with  a  much  larger  variation  compared  with  other  RAR  methods. 
The  GDL  has  the  lowest  variation  among  the  4  sequential  RAR  methods.  When  Rrpw  (the  same 
as  Rdl )  is  the  allocation  target,  DL  urn  method  has  the  lowest  variation  in  patient  allocation, 
which  is  consistent  with  the  fact  that  the  lower  bound  of  the  estimate  of  Yar(RRpw)  is  attained  by 
DL  urn  [4],  The  comparison  among  allocation  targets  shows  that  Rllr  has  the  lowest  variation  in 
patient  allocation,  and  the  highest  variation  is  usually  found  at  Rplsk  or  Rrpw.  However,  Rrpw  and 
RRisk  are  usually  the  top  two  allocation  targets  that  assign  more  patients  to  the  better  treatment. 
Rwaid,  Rodds,  and  Rllr  will  assign  more  patients  to  the  worse  arm  in  some  simulation  cases. 
Among  the  3  allocation  targets  that  assign  more  patients  to  the  better  treatment  ( Rrsihr ,  Rrm  and 
Rrpw),  Rrsihr  has  a  stable  and  often  the  lowest  variation  in  patient  allocation. 

The  simulation  results  for  5  null  cases  and  10  alternative  cases  are  shown  in  Tables  6-11  with  one 
table  for  each  of  the  six  allocation  targets.  To  simplify  the  presentation,  the  results  are  shown 
only  for  the  4  modified  test  statistics  TMw,  TMo,  Tmc,  Tml,  and  the  log-relative-risk  test  statistic 
TRisk  because  they  tend  to  have  better  performance  than  the  four  corresponding  unmodified  tests. 
Additionally,  Table  12  summarizes  the  results  of  each  test  statistic  in  Tables  6-11  by  averaging 
the  results  over  the  5  null  cases  and  the  10  alternative  cases  for  a  given  RAR  methods  and  at  a 
given  allocation  target.  The  qualitative  comparisons  among  test  statistics,  RAR  methods,  and 
allocation  targets  can  be  made  based  on  the  results  in  Table  12.  Detailed  comparison  at  a  given 
scenario  can  be  found  in  Tables  6-11. 
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As  shown  in  Table  12,  the  worst  performance  can  be  found  in  the  results  of  log-relative-risk  test 
statistic  T Risi(,  which  is  often  conservative,  but  can  have  much  inflated  type  I  error  at  Rrm-  TMw, 
the  Agresti’s  correction  to  TWaid,  is  always  slightly  conservative  across  all  simulation  cases. 
Meanwhile,  TMo  the  test  using  log-odds-ratio  is  very  conservative,  especially  when  response  rate 
is  low  (also  see  Tables  6-11).  Overall,  the  Cook’s  correction  to  Chi-square  test  statistic,  TMc,  is 
the  best  in  attaining  the  correct  type  I  error  rate.  The  Williams’  correction  to  log-likelihood-ratio 
test,  TMl,  is  slightly  inflated  as  compared  with  chi-square  test  TMc ■  The  simulation  results  not 
shown  here  indicate  that  TMl  is  very  robust  against  the  unbalance  in  patient  allocation  even  when 
sample  size  is  20.  The  comparison  between  different  RAR  methods  shows  that  the  mean  type  I 
error  of  GDL  can  usually  match  the  correct  size  of  tests  better  than  other  methods.  The  type  I 
error  of  DBCD  is  usually  the  most  inflated  one,  except  at  Rodds ■  The  type  I  error  of  SEU  is 
comparable  with  GDL,  but  more  conservative. 

The  power  comparison  of  different  test  statistics  indicates  that  TRisk  is  the  statistic  with  the 
highest  power  at  RRisk  and  Rrpw,  but  with  a  much  inflated  type  I  error.  Except  at  RRisk  and  Rrpw, 
Cook’s  correction  to  Chi-square  test  TMc  or  Williams’  correction  to  log-likelihood-ratio  test  TMl 
is  the  one  with  the  highest  power.  Usually,  GDL  has  the  highest  power  and  SEU  has  the  lowest 
power  among  all  RAR  methods.  DBCD  and  SMLE  have  similar  power,  but  DBCD  is  more 
powerful  in  most  cases.  At  target  Rrpw,  DL  urn  has  the  best  statistical  properties.  On  average,  the 
target  with  the  lowest  power  achieved  by  test  statistics  is  R Risks-  The  highest  overall  power  can 
usually  be  achieved  by  test  statistics  at  Rrsihr  and  Rllr,  but  Rllr  has  the  disadvantage  of 
assigning  more  patients  to  the  worse  treatment  in  some  cases. 


13 


Discussion 


In  response-adaptive  randomization,  the  assignment  of  a  new  patient  depends  on  the  treatment 
outcomes  of  patients  previously  enrolled  in  the  trial.  Delayed  responses  are  often  encountered  in 
practice.  Recently,  the  problem  of  delayed  response  in  multi-arm  generalized  drop-the-loser  urn 
and  generalized  Friedman’s  urn  design  is  studied  for  both  continuous  and  discontinuous  outcomes 
[11,  16,  17,  36],  It  is  shown  that,  under  reasonable  assumption  about  the  delay,  the  asymptotic 
properties  of  adaptive  design  are  not  affected  by  the  delay.  In  the  present  study,  the  primary  focus 
is  the  comparison  between  commonly  used  test  statistics  for  2*2  table.  Based  on  results  not  shown 
here,  a  less  extreme  allocation  with  higher  variation  would  be  expected  when  a  random  delay  is 
assumed.  For  the  sake  of  simplicity,  it  is  assumed  that  the  response  status  of  each  of  the  patients 
already  in  the  trial  is  available  before  the  allocation  of  a  new  patient  in  our  simulations. 

One  goal  of  adaptive  randomization  is  to  assign  more  patients  to  the  superior  treatment,  which  is 
meaningful  only  if  the  treatment  identified  as  better  during  the  trial  will  not  cause  serious  health 
problems  in  the  future.  Ethical  concerns  could  arise  should  patients  experience  unforeseeable, 
serious,  long-term  side  effects  from  the  treatment  when  adaptive  randomization  is  based  only  on 
short-term  benefits.  Thus,  RAR  is  only  suitable  under  the  assumptions  that  the  treatments 
putatively  considered  to  be  superior  do  not  have  serious  long-term  adverse  effects.  This  point 
holds  regardless  of  which  randomization  method  is  used. 

The  RAR  methods  simulated  in  the  present  study  are  aimed  at  assigning  patients  to  the  better 
treatment  with  probabilities  higher  than  what  otherwise  would  be  allowed  by  equal 
randomization.  The  price  being  paid  is  that  the  sample  sizes  on  the  two  comparing  arms  are  no 
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longer  fixed,  and  the  adaptation  in  patient  allocation  can  complicate  the  statistical  inference  at 
the  end  of  the  trial.  The  properties  of  test  statistics  will  change  when  the  patient  allocation  ratio 
changes  in  adaptive  randomization.  The  power  of  test  statistics  shown  in  the  present  simulation 
is  obtained  by  averaging  over  trials  with  an  unknown  distribution  of  allocation  ratios.  As  shown 
in  our  simulation  results,  a  large  deviation  from  the  nominal  significance  level  of  the  hypothesis 
test  can  be  found  under  the  null  hypothesis.  Therefore,  the  practice  of  comparing  asymptotic 
hypothesis  testing  methods  based  solely  on  statistical  power  under  the  alternative  hypothesis  is 
not  recommended.  It  is  important  to  compare  adaptive  randomization  methods  based  on  both  the 
type  I  error  rate  and  the  statistical  power,  especially  when  the  sample  size  is  small. 

General  recommendations  given  in  the  result  section  are  based  on  the  aggregated  results  across 
different  settings.  Because  the  performance  of  test  statistics,  RAR  methods,  and  allocation  target 
are  closely  related  to  each  other,  recommendations  under  a  specific  scenario  can  be  found  based 
on  the  detailed  simulation  results  in  Tables  6-11. 

Based  on  simulation  results,  the  Cook’s  correction  to  Chi-square  test  statistic  TMc  and  Williams’ 
correction  to  log-likelihood-ratio  test  TMl  is  recommended  to  be  used  for  hypothesis  testing  at  the 
end  of  adaptive  randomization.  TMc  has  good  ability  to  attain  the  correct  significance  level,  and 
is  relatively  robust  against  the  change  of  RAR  method  or  allocation  target.  TMl  has  more  robust 
performance  than  TMc  and  has  higher  power,  but  its  type  I  error  is  slightly  inflated  as  compared 
with  TMc ■  When  the  sample  size  is  small,  TMl  attains  more  accurate  type  I  error  than  TMc ■  The 
original  Wald-type  Z  test  statistic  TWaid,  which  is  very  sensitive  to  patient  allocation  and  has 
inflated  type  I  error,  should  be  avoided  at  small  sample  sizes.  On  the  other  hand,  TMw,  the 
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Argresti’s  correction  to  TWaid,  and  TMo  the  modified  log-odds-ratio  test  are  too  conservative  and 
under  powered  at  small  sample  sizes. 


The  primary  objective  of  current  study  is  to  compare  test  statistics.  Since  the  recommended  test 
statistics  are  TMc  and  TMl,  the  comparison  between  RAR  methods  and  allocation  targets  are 
mainly  based  on  these  two  selected  test  statistics.  Among  SMLE,  DBCD,  SEU,  and  GDL 
methods,  GDL  has  the  best  ability  to  attain  the  correct  size  of  hypothesis  test,  and  has 
comparatively  higher  overall  power  at  most  allocation  targets  due  to  its  low  variation  in  patient 
allocation.  Therefore,  GDL  is  the  recommended  RAR  method.  The  sequential 
estimation-adjusted  urn  (SEU)  method  is  comparable  with  GDL  in  controlling  the  type  I  error. 
However,  SEU  is  often  under  powered,  and  the  high  variation  makes  it  less  useful  in  practice.  The 
DBCD  method  with  y  equal  to  2  is  the  best  in  targeting  the  true  allocation  ratio.  When  TMc  is  the 
test  statistic,  DBCD  has  slightly  inflated  type  I  error  and  lower  power  as  compared  with  GDL. 
Therefore,  the  balance  between  controlling  the  type  I  error,  obtaining  higher  power,  and  targeting 
a  given  allocation  ratio  can  be  reached  when  y  is  equal  to  2.  The  simulation  comparison  of 
statistical  power  for  different  RAR  methods  also  indicates  that  DL  urn  has  the  best  statistical 
properties  at  Rrpw,  mainly  due  to  its  low  variation  in  patient  allocation. 

The  statistical  characteristics  of  hypothesis  tests  and  RAR  methods  also  depend  on  allocation 
targets.  At  Rwaid,  Rodds,  and  Rllr  targets,  more  patients  could  be  assigned  to  the  inferior  treatment 
in  certain  parameter  spaces.  In  contrast,  Rrm,  Rrpw,  and  Rrsihr  always  assign  more  patients  to  the 
better  treatment.  However,  due  to  the  more  extreme  allocation  of  Rrm  and  Rrpw,  both  power  and 
type  I  error  of  Rrm  and  Rrpw  will  suffer  as  compared  with  Rrsihr ■  On  the  other  hand,  the 
variation  of  patient  allocation  at  Rrishr  is  relatively  small  with  a  stable  value  across  all 
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simulation  scenarios.  Additional,  among  all  designs  with  similar  power  using  Wald-type  test 
statistic,  Rrsihr  allocation  ration  can  achieve  fewer  failures  in  the  whole  trial.  Therefore,  Rrsihr  is 
recommended  among  all  the  allocation  targets  in  the  present  study. 

Although  the  simulation  results  are  not  shown  here,  for  comparison  purpose,  adaptive 
randomization  is  also  simulated  using  Optimal  Design  (OD)  [37]  and  Thompson’s  traditional 
Bayesian  method  [38],  The  Optimal  Design  (OD)  is  developed  based  on  Bayesian  decision 
theoretic  analysis.  A  well  defined  utility  will  be  optimized  over  not  only  for  the  n  patients  in  the 
trial  but  also  considering  additional  N  future  patients,  referred  to  as  patient  horizon.  The 
commonly  used  utility  is  the  total  number  of  success  in  all  (n+N)  patients.  The  N  patients  outside 
the  trial  will  be  assigned  to  one  treatment  based  on  the  decision  made  from  the  n  patients  in  the 
trial.  If  a  large  patient  horizon  is  used,  the  potential  loss  due  to  wrong  decision  at  the  end  of  trial 
will  also  be  large.  Therefore,  the  OD  method  will  emphasize  on  collecting  information  during  the 
trial  such  that  the  probability  of  making  the  right  decision  will  be  improved,  which  is  equivalent  to 
increasing  the  ffequentist’s  power  in  hypothesis  testing.  If  a  small  patient  horizon  is  used,  the  OD 
method  will  focus  on  assigning  more  patients  to  the  better  arm,  and  resulted  in  a  more  unbalanced 
patient  allocation.  An  efficient  implementation  of  OD  has  to  rely  on  backward  induction  through 
dynamic  programming,  which  is  computationally  expensive  for  large  sample  sizes  [39],  In 
Bayesian  adaptive  randomization  of  two-arm  comparative  trials  with  binary  outcomes,  it  is  often 
assumed  that  the  prior  distributions  of  response  rates  follow  a  Beta  distribution:  px  ~  Beta{al,bl ) 

and  p2  ~  Beta(a2,b2) .  Therefore,  the  posterior  distribution  of  response  rate  also  follows  a  Beta 
distribution,  and  the  probability  of  a  new  patient  being  assigned  to  arm  1  is  calculated  by 
Pr(/j,  >  p2 1  Data) ,  which  converges  to  0  or  1  with  increasing  sample  size.  When  the  patient 
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horizon  is  zero  and  Beta(  1,1)  is  used  for  both  p\  and  pi,  the  performances  of  all  test  statistics  are 
poor  in  OD  and  Bayesian  method,  primarily  due  to  the  more  extreme  allocation  and  higher 
variation  as  compared  with  other  RAR  methods.  However,  the  poor  performance  of  OD  and 
Bayesian  method  is  caused  by  the  way  chosen  for  implementation  rather  than  their  intrinsic 
properties.  There  are  many  new  methods  that  can  be  used  to  achieve  better  frequentist’s 
characteristics  in  Optimal  Design  [40]  or  Bayesian  method.  For  example,  a  lower  limit  and  a 
higher  limit  can  be  set  for  the  randomization  probability  to  avoid  the  extreme  allocation 
probabilities  close  to  0  or  1.  The  primary  objective  of  present  study  is  to  compare  the  commonly 
used  test  statistics.  The  simulation  of  OD  and  Bayesian  method  enable  us  to  investigate  the 
properties  of  test  statistics  at  allocation  ratios  that  are  more  extreme  than  the  ones  found  in  RAR 
methods.  The  simulation  results  indicate  that  the  Williams’  correction  to  log- likelihood  ratio  test  is 
very  robust  against  extreme  patient  allocation.  For  example,  when  sample  size  is  only  20,  the  type 
I  error  of  TMl  can  still  be  controlled  at  a  reasonable  level  in  OD  and  Bayesian  methods.  The 
performance  of  Cook’s  correction  to  Chi-square  test  is  inferior  to  TMl  with  larger  inflated  type  I 
error.  On  the  other  hand,  the  Wald  type  test  statistic  TWaid  and  TMw  are  extremely  sensitive  to 
unbalanced  allocation  ratios. 


Conclusion 

The  Cook’s  correction  to  Chi-square  test  and  Williams’  correction  to  log-likelihood-ratio  test  are 
recommended  for  hypothesis  test  of  RAR  at  small  sample  sizes.  Among  all  the  RAR  methods 
compared,  GDL  method  has  better  statistical  properties  in  controlling  type  one  error  and 
maintaining  high  statistical  power.  The  RSIHR  allocation  target  provides  a  good  balance  between 
assigning  more  patients  to  the  better  treatment  and  maintaining  a  high  overall  power. 
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Abbreviations 


RAR:  Response-adaptive  randomization 
RPW:  Randomized  play-the-winner 
DL:  Drop-the-loser 

DBCD:  Doubly-adaptive  biased  coin  design 
SMLE:  Sequential  maximum  likelihood  estimation  design 
SEU:  Sequential  estimation-adjusted  urn 
GDL:  Generalized  drop-the-loser  urn 

RSIHR:  Optimal  allocation  target  minimizing  total  numbers  of  failure  for  Wald-type  test  statistics 
at  fixed  power 

MLE:  Maximum  likelihood  estimate 
OD:  Optimal  design 
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Table  5.  Mean  and  standard  deviation  (in  parenthesis)  of  allocation  rate  on  arm  1  for  n  =  30.  The 
2  Urn  modes,  RPW  and  DL,  have  exactly  the  same  limiting  allocation  rate  Rrpw-  The  4 
sequential  methods,  SMLE,  DBCD,  SEU,  and  GDL,  can  target  the  following  6  allocation  targets: 

Rwald,  R/iisk,  Rodds ,  RlLR,  RrSIHR,  and  Rrpw- 

Table  6.  Power  and  type  I  error  at  Rwaid  (alpha  =  0.05,  n  =  30).  For  each  RAR  methods,  the 
results  of  the  following  5  test  statistics  are  shown:  Agresti’s  correction  to  Wald-type  Z  test  TMw, 
log-relative-risk  test  TRisk,  Gart’s  correction  to  log-odds-ratio  test  TMo,  Cook’s  correction  to 
Chi-square  test  TMc,  and  Williams’  correction  log-likelihood-ratio  test  TMl- 

Table  7.  Power  and  type  I  error  at  Rrm  (alpha  =  0.05,  n  =  30).  For  each  RAR  methods,  the  results 
of  the  following  5  test  statistics  are  shown:  Agresti’s  correction  to  Wald-type  Z  test  Tmw, 
log-relative-risk  test  Tpisk,  Gart’s  correction  to  log-odds-ratio  test  Tmo,  Cook’s  correction  to 
Chi-square  test  TMc,  and  Williams’  correction  log-likelihood-ratio  test  TMl- 
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Table  8.  Power  and  type  I  error  at  Rodds  (alpha  =  0.05,  n  =  30).  For  each  RAR  methods,  the 
results  of  the  following  5  test  statistics  are  shown:  Agresti’s  correction  to  Wald-type  Z  test  Tmw, 
log-relative-risk  test  TRisk,  Gart’s  correction  to  log-odds-ratio  test  TMo,  Cook’s  correction  to 
Chi-square  test  Tmc,  and  Williams’  correction  log-likelihood-ratio  test  TMl- 

Table  9.  Power  and  type  I  error  at  Rllr  (alpha  =  0.05,  n  =  30).  For  each  RAR  methods,  the  results 
of  the  following  5  test  statistics  are  shown:  Agresti’s  correction  to  Wald-type  Z  test  Tmw, 
log-relative-risk  test  Tr^,  Gart’s  correction  to  log-odds-ratio  test  Tmo,  Cook’s  correction  to 
Chi-square  test  Tmc,  and  Williams’  correction  log-likelihood-ratio  test  Tml- 

Table  10.  Power  and  type  I  error  at  Rrsihr  (alpha  =  0.05,  n  =  30).  For  each  RAR  methods,  the 
results  of  the  following  5  test  statistics  are  shown:  Agresti’s  correction  to  Wald-type  Z  test  Tmw , 
log-relative-risk  test  Tr^,  Gart’s  correction  to  log-odds-ratio  test  Tmo,  Cook’s  correction  to 
Chi-square  test  Tmc,  and  Williams’  correction  log-likelihood-ratio  test  Tml- 

Table  11.  Power  and  type  I  error  at  Rrpw  (alpha  =  0.05,  n  =  30).  For  each  RAR  methods,  the 
results  of  the  following  5  test  statistics  are  shown:  Agresti’s  correction  to  Wald-type  Z  test  TMw, 
log-relative-risk  test  TRisk,  Gart’s  correction  to  log-odds-ratio  test  TMo,  Cook’s  correction  to 
Chi-square  test  Tmc,  and  Williams’  correction  log-likelihood-ratio  test  TMl- 

Table  12.  The  mean  and  standard  deviation  (in  parenthesis)  of  type  I  error  and  power  calculated 


by  averaging  simulation  results  over  the  5  null  cases  and  the  10  alternative  cases  of  simulation 
scenarios.  All  results  have  been  multiplied  by  100%  (alpha  =  0.05,  n  =  30). 
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Figures  and  Tables 


Table  1 .  Summary  of  data  from  a  two-arm  comparative  clinical  trial 


Response 

Failure 

Margins 

Treatment  1 

r\ 

/i 

nx 

Treatment  2 

ri 

h 

n-  ri\  =  n2 

Margins 

r\+r2  =  r 

n-r  =fi+fi  =f 

n 

n :  total  number  of  patients;  n\,  ny  patients  on  treatment  1  and  2;  r:  total  number  of  treatment  successes;  r\, 
ry.  number  of  successes  on  treatment  1  and  2. 


Table  2.  Test  statistics 


Log-Relative-Risk 

TRisk  =  0°gt/>i  /  /i»2  ))2  /C  K/i  +  r2  /n2f2 ) 

Log-odds-ratio 

TOdds  =  (l°g(/2ri 1  f\ri))  /(y.f\  +  V /a  +  l/ri  +  Vr2  ) 

Wald-type  Z 

=  (ri  lnx-r2ln2  )2/(/2  rx/n\  +fxr2/nl) 

Chi-Square 

TchiSq  =  (n~ !) (ri/2  -  rif\  )2 / rfn\ n2 

Log-likelihood-ratio 

tllr  =  2-0]  log/]  +  r2 log /•,  +  /;  log/,'  +  /2 log /2 
-rlogr-/log/-«1 log^  -«2 logn2  -h^logn) 

Gart’s  Correction  to  7^  [29]  TMO  =  (log(/  \n\!  f\n'2)f  /(r\/n\  f\+  r\/n\  f\) 

Agresti’s  Correction  to  TWaU 

T„  =(rV»VrV»’!)7(/V",/"'b  /",  r"Jn  ) 

Cook’s  Correction  to  TChiSq 

TMC  =(»-!)  (  ri/2  “  ^/l  h  0-5)2  / 

William’s  Correction  to  TLLR 

[3 !]  TML  =  D  +  («2  -  7/X«2  -  «1«2 )  /  brfiyynl'  •  TLLR 

r'i=ri+0.5,  r'2 
r"i=ri+l,r' 

=r2+0.5,f\=fi+0.5,f'2=f2+0.5,  r'=r+\,f'=f+l,  n\=m+ 1,  «'2=«2+l,  «'=«+2 
'2=r2+l,/'i^/'i+l,/'2=^+l,  r"=r+2,f'=f+2,  «"1=n1+2,  n"2=n2+ 2,  n"=n+ 4 

Table  3.  Allocation  targets 

Optimal  allocation  ratio  («i/«2)  for  maximizing  powers 

Rmsk 

R-Odds  / 

Rchisq 

RjVald / 

Reyman 

RlLR  {<?2 

^plq2/p2q1 

V p2q2/p i<?i 

ylpiqjp2q2 

-p2  exp[7,  -I2  /  Cp2  -  /?;)]}/ {-^  +  /?!  exp  [A  -  72  /  (/?2  -  /?, )] } 

Other  allocation  targets 

/ 

Rdl 

q2lqx 

Rrsihr 

yjpjpi  (Minimize  the  number  of  failure  at  fixed  power  of  7VaW) 

A  =  Pi  l°g (  A  )  +  ?i  logfaj ) ,  1 2  =  P2  log 02 )  +  q2  \og(q2 ) 
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Table  4.  Asymptotic  allocation  rates  on  arm  1  calculated  from  true  p\  and  m 


Pi 

P2 

0.100 

0.300 

0.100 

0.500 

0.100 

0.700 

0.100 

0.900 

0.300 

0.500 

0.300 

0.700 

0.300 

0.900 

0.500 

0.700 

0.500 

0.900 

0.700 

0.900 

R  Wald  !  R-Neyman 

0.396 

0.375 

0.396 

0.500 

0.478 

0.500 

0.604 

0.522 

0.625 

0.604 

R-Risk 

0.337 

0.250 

0.179 

0.100 

0.396 

0.300 

0.179 

0.396 

0.250 

0.337 

R()dds  /  R-Chisq 

0.604 

0.625 

0.604 

0.500 

0.522 

0.500 

0.396 

0.478 

0.375 

0.396 

Rllr 

0.534 

0.538 

0.528 

0.500 

0.507 

0.500 

0.472 

0.493 

0.462 

0.466 

Rrsihr 

0.366 

0.309 

0.274 

0.250 

0.436 

0.396 

0.366 

0.458 

0.427 

0.469 

Rrpw  !  Rdl 

0.438 

0.357 

0.250 

0.100 

0.417 

0.300 

0.125 

0.375 

0.167 

0.250 

27 


Table  5.  Mean  and  standard  deviation  (in  parenthesis)  of  allocation  rate  on  arm  1  for  n  =  30.  The  2  Urn  modes,  RPW  and  DL,  have 
exactly  the  same  limiting  allocation  rate  Rrpw-  The  4  sequential  methods,  SMLE,  DBCD,  SEU,  and  GDL,  can  target  the  following  6 
_ allocation  targets:  Rwaid,  Rmsk,  Rodds,  Rllr,  Rrsihr,  and  Rrpw- _ 


Pi 

0.2 

0.3 

0.5 

0.7 

0.8 

0.1 

0.1 

0.1 

0.1 

0.3 

0.3 

0.3 

0.5 

0.5 

0.7 

P2 

0.2 

0.3 

0.5 

0.7 

0.8 

0.3 

0.5 

0.7 

0.9 

0.5 

0.7 

0.9 

0.7 

0.9 

0.9 

Urn 

RPW 

0.500(0.081) 

0.500(0.095) 

0.500(0.129) 

0.500(0.179) 

0.500(0.209) 

0.444(0.080) 

0.375(0.092) 

0.287(0.096) 

0.181(0.088) 

0.430(0.109) 

0.341(0.120) 

0.227(0.123) 

0.411(0.147) 

0.288(0.160) 

0.375(0.202) 

DL 

0.500(0.048) 

0.500(0.058) 

0.500(0.078) 

0.500(0.092) 

0.500(0.097) 

0.447(0.046) 

0.383(0.055) 

0.316(0.056) 

0.249(0.053) 

0.437(0.067) 

0.363(0.071) 

0.290(0.066) 

0.424(0.082) 

0.343(0.082) 

0.416(0.092) 

Rwaid 

0.500(0.106) 

0.500(0.103) 

0.500(0.098) 

0.500(0.103) 

0.500(0.106) 

0.440(0.100) 

0.424(0.098) 

0.441(0.100) 

0.501(0.102) 

0.483(0.101) 

0.500(0.104) 

0.559(0.100) 

0.517(0.100) 

0.576(0.099) 

0.558(0.101) 

Rrm 

0.500(0.130) 

0.500(0.134) 

0.500(0.140) 

0.500(0.151) 

0.500(0.158) 

0.397(0.117) 

0.325(0.107) 

0.259(0.095) 

0.186(0.079) 

0.415(0.133) 

0.334(0.124) 

0.238(0.109) 

0.411(0.139) 

0.298(0.131) 

0.375(0.149) 

SMLE 

Rodds 

0.500(0.109) 

0.500(0.098) 

0.500(0.091) 

0.500(0.099) 

0.500(0.109) 

0.562(0.110) 

0.577(0.107) 

0.561(0.110) 

0.499(0.126) 

0.517(0.095) 

0.500(0.098) 

0.438(0.109) 

0.485(0.095) 

0.423(0.107) 

0.438(0.109) 

Rllr 

0.500(0.093) 

0.500(0.092) 

0.500(0.091) 

0.500(0.093) 

0.500(0.094) 

0.519(0.094) 

0.522(0.094) 

0.515(0.094) 

0.499(0.095) 

0.506(0.092) 

0.499(0.091) 

0.483(0.093) 

0.495(0.092) 

0.477(0.094) 

0.481(0.094) 

Rrslhr 

0.500(0.117) 

0.500(0.116) 

0.500(0.109) 

0.500(0.106) 

0.500(0.102) 

0.417(0.108) 

0.369(0.100) 

0.335(0.093) 

0.312(0.087) 

0.447(0.112) 

0.408(0.107) 

0.378(0.103) 

0.459(0.106) 

0.429(0.105) 

0.468(0.101) 

Rrpw 

0.500(0.100) 

0.500(0.109) 

0.500(0.131) 

0.500(0.166) 

0.500(0.192) 

0.447(0.099) 

0.384(0.105) 

0.297(0.106) 

0.179(0.091) 

0.434(0.117) 

0.343(0.122) 

0.209(0.110) 

0.405(0.141) 

0.255(0.136) 

0.332(0.174) 

Rwaid 

0.500(0.090) 

0.500(0.075) 

0.500(0.055) 

0.500(0.075) 

0.500(0.090) 

0.417(0.081) 

0.393(0.073) 

0.416(0.081) 

0.499(0.095) 

0.475(0.065) 

0.500(0.075) 

0.585(0.081) 

0.525(0.065) 

0.607(0.073) 

0.584(0.081) 

RRisk 

0.500(0.126) 

0.500(0.124) 

0.500(0.123) 

0.500(0.127) 

0.500(0.140) 

0.371(0.106) 

0.285(0.086) 

0.216(0.071) 

0.138(0.054) 

0.394(0.116) 

0.300(0.104) 

0.187(0.083) 

0.391(0.118) 

0.250(0.108) 

0.337(0.130) 

Rodds 

0.500(0.082) 

0.500(0.061) 

0.500(0.047) 

0.500(0.061) 

0.500(0.082) 

0.585(0.085) 

0.607(0.078) 

0.586(0.086) 

0.499(0.110) 

0.520(0.053) 

0.501(0.061) 

0.413(0.086) 

0.480(0.054) 

0.394(0.079) 

0.414(0.084) 

DBCD 

Rllr 

0.500(0.049) 

0.500(0.046) 

0.500(0.044) 

0.500(0.047) 

0.500(0.049) 

0.474(0.048) 

0.468(0.046) 

0.477(0.047) 

0.500(0.047) 

0.493(0.045) 

0.500(0.046) 

0.524(0.047) 

0.508(0.045) 

0.532(0.046) 

0.527(0.048) 

Rrsihr 

0.500(0.107) 

0.500(0.099) 

0.500(0.078) 

0.500(0.060) 

0.500(0.054) 

0.392(0.093) 

0.332(0.077) 

0.297(0.069) 

0.273(0.063) 

0.431(0.088) 

0.387(0.080) 

0.353(0.075) 

0.453(0.069) 

0.417(0.066) 

0.464(0.055) 

Rrpw 

0.500(0.064) 

0.500(0.074) 

0.500(0.104) 

0.500(0.148) 

0.500(0.185) 

0.440(0.063) 

0.366(0.072) 

0.266(0.078) 

0.129(0.064) 

0.422(0.087) 

0.317(0.095) 

0.157(0.082) 

0.386(0.118) 

0.201(0.112) 

0.284(0.158) 

Rwaid 

0.500(0.113) 

0.500(0.106) 

0.500(0.098) 

0.500(0.106) 

0.500(0.114) 

0.476(0.113) 

0.464(0.110) 

0.473(0.113) 

0.505(0.117) 

0.493(0.104) 

0.502(0.106) 

0.535(0.108) 

0.509(0.102) 

0.540(0.102) 

0.532(0.108) 

Rmsk 

0.500(0.155) 

0.500(0.168) 

0.500(0.195) 

0.500(0.223) 

0.500(0.237) 

0.433(0.143) 

0.361(0.130) 

0.296(0.115) 

0.234(0.091) 

0.440(0.166) 

0.365(0.154) 

0.280(0.126) 

0.437(0.197) 

0.337(0.171) 

0.411(0.212) 

Rodds 

0.500(0.101) 

0.500(0.104) 

0.500(0.130) 

0.500(0.176) 

0.500(0.196) 

0.514(0.108) 

0.497(0.124) 

0.462(0.143) 

0.388(0.137) 

0.489(0.119) 

0.453(0.134) 

0.384(0.131) 

0.469(0.150) 

0.399(0.146) 

0.438(0.177) 

SEU 

Rllr 

0.500(0.093) 

0.500(0.091) 

0.500(0.091) 

0.500(0.093) 

0.500(0.092) 

0.510(0.093) 

0.512(0.094) 

0.508(0.093) 

0.501(0.094) 

0.503(0.092) 

0.500(0.091) 

0.493(0.094) 

0.498(0.093) 

0.490(0.094) 

0.490(0.092) 

Rrsihr 

0.500(0.149) 

0.500(0.146) 

0.500(0.131) 

0.500(0.116) 

0.500(0.106) 

0.461(0.143) 

0.425(0.130) 

0.402(0.122) 

0.383(0.113) 

0.475(0.136) 

0.449(0.126) 

0.429(0.121) 

0.479(0.124) 

0.460(0.117) 

0.481(0.109) 

Rrpw 

0.500(0.135) 

0.500(0.155) 

0.500(0.192) 

0.500(0.222) 

0.500(0.233) 

0.469(0.129) 

0.424(0.136) 

0.367(0.135) 

0.294(0.113) 

0.462(0.164) 

0.408(0.162) 

0.326(0.141) 

0.456(0.197) 

0.366(0.173) 

0.423(0.208) 

Rwaid 

0.500(0.056) 

0.500(0.046) 

0.500(0.033) 

0.500(0.047) 

0.500(0.056) 

0.450(0.051) 

0.437(0.046) 

0.452(0.051) 

0.500(0.058) 

0.486(0.040) 

0.499(0.047) 

0.548(0.052) 

0.514(0.041) 

0.562(0.046) 

0.548(0.051) 

Rmsk 

0.500(0.106) 

0.500(0.114) 

0.500(0.128) 

0.500(0.144) 

0.500(0.154) 

0.397(0.093) 

0.320(0.085) 

0.251(0.071) 

0.181(0.055) 

0.407(0.114) 

0.319(0.104) 

0.220(0.078) 

0.397(0.128) 

0.274(0.104) 

0.356(0.138) 

Rodds 

0.500(0.040) 

0.500(0.035) 

0.500(0.055) 

0.500(0.090) 

0.500(0.112) 

0.527(0.043) 

0.508(0.053) 

0.454(0.072) 

0.341(0.080) 

0.484(0.045) 

0.431(0.064) 

0.327(0.072) 

0.447(0.071) 

0.342(0.080) 

0.390(0.102) 

GDL 

Rllr 

0.500(0.029) 

0.500(0.026) 

0.500(0.024) 

0.500(0.026) 

0.500(0.029) 

0.517(0.027) 

0.521(0.026) 

0.515(0.027) 

0.500(0.028) 

0.505(0.024) 

0.500(0.026) 

0.485(0.027) 

0.495(0.025) 

0.479(0.026) 

0.483(0.028) 

Rrsihr 

0.500(0.073) 

0.500(0.070) 

0.500(0.058) 

0.500(0.045) 

0.500(0.039) 

0.431(0.065) 

0.389(0.057) 

0.362(0.051) 

0.342(0.047) 

0.454(0.062) 

0.423(0.056) 

0.398(0.052) 

0.466(0.052) 

0.440(0.046) 

0.472(0.038) 

Rrpw 

0.500(0.053) 

0.500(0.065) 

0.500(0.088) 

0.500(0.116) 

0.500(0.133) 

0.454(0.052) 

0.399(0.063) 

0.329(0.067) 

0.236(0.059) 

0.444(0.075) 

0.367(0.082) 

0.263(0.073) 

0.420(0.098) 

0.303(0.092) 

0.370(0.121) 
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Table  6.  Power  and  type  I  error  at  Rwdd  (alpha  =  0.05,  n  =  30).  For  each  RAR  methods,  the 
results  of  the  following  5  test  statistics  are  shown:  Agresti’s  correction  to  Wald-type  Z  test  Tmw, 


log-relative-risk  test  TRisk,  Gart’s  correction  to  log-odds-ratio  test  TMo,  Cook’s  correction  to 
_ Chi-square  test  Tmc,  and  Williams’  correction  log-likelihood-ratio  test  Tml . _ 


Pi 

P2 

0.200 

0.200 

0.300 

0.300 

0.500 

0.500 

0.700 

0.700 

0.800 

0.800 

0.100 

0.300 

0.100 

0.500 

0.100 

0.700 

0.100 

0.900 

0.300 

0.500 

0.300 

0.700 

0.300 

0.900 

0.500 

0.700 

0.500 

0.900 

0.700 

0.900 

Tmw 

0.031 

0.048 

0.056 

0.050 

0.033 

0.196 

0.674 

0.953 

0.999 

0.201 

0.600 

0.950 

0.203 

0.680 

0.202 

TRisk 

0.102 

0.072 

0.039 

0.014 

0.003 

0.326 

0.693 

0.940 

0.996 

0.181 

0.501 

0.798 

0.113 

0.288 

0.024 

SMLE 

Tmo 

0.007 

0.022 

0.041 

0.024 

0.007 

0.063 

0.492 

0.928 

0.999 

0.162 

0.563 

0.923 

0.161 

0.495 

0.069 

Tmc 

0.044 

0.052 

0.056 

0.055 

0.044 

0.231 

0.689 

0.954 

0.999 

0.203 

0.601 

0.952 

0.205 

0.693 

0.235 

Tml 

0.074 

0.066 

0.055 

0.067 

0.079 

0.308 

0.709 

0.954 

0.999 

0.203 

0.595 

0.951 

0.205 

0.711 

0.309 

Tmw 

0.029 

0.050 

0.057 

0.052 

0.026 

0.186 

0.685 

0.957 

0.999 

0.212 

0.607 

0.958 

0.206 

0.696 

0.191 

TRisk 

0.120 

0.085 

0.041 

0.008 

0.001 

0.361 

0.721 

0.954 

0.998 

0.204 

0.524 

0.811 

0.109 

0.257 

0.010 

DBCD 

Tmo 

0.004 

0.017 

0.045 

0.017 

0.003 

0.041 

0.462 

0.933 

0.999 

0.169 

0.587 

0.934 

0.164 

0.475 

0.042 

Tmc 

0.037 

0.056 

0.058 

0.056 

0.034 

0.211 

0.696 

0.958 

0.999 

0.215 

0.607 

0.959 

0.208 

0.706 

0.215 

Tml 

0.077 

0.074 

0.059 

0.073 

0.077 

0.311 

0.718 

0.958 

0.999 

0.217 

0.607 

0.959 

0.210 

0.727 

0.315 

Tmw 

0.031 

0.045 

0.048 

0.044 

0.030 

0.200 

0.655 

0.946 

0.999 

0.190 

0.583 

0.948 

0.191 

0.675 

0.213 

TRisk 

0.067 

0.048 

0.033 

0.016 

0.006 

0.259 

0.646 

0.922 

0.991 

0.154 

0.486 

0.812 

0.114 

0.342 

0.046 

SEU 

Tmo 

0.013 

0.026 

0.039 

0.027 

0.011 

0.094 

0.522 

0.921 

0.999 

0.158 

0.553 

0.926 

0.157 

0.533 

0.095 

Tmc 

0.046 

0.051 

0.049 

0.050 

0.046 

0.248 

0.675 

0.949 

0.999 

0.195 

0.585 

0.950 

0.195 

0.698 

0.258 

Tml 

0.062 

0.055 

0.047 

0.055 

0.062 

0.285 

0.683 

0.947 

0.999 

0.190 

0.577 

0.949 

0.193 

0.710 

0.305 

Tmw 

0.036 

0.051 

0.051 

0.049 

0.034 

0.223 

0.696 

0.954 

1.000 

0.195 

0.601 

0.958 

0.200 

0.692 

0.214 

Tr^ 

0.075 

0.060 

0.040 

0.010 

0.001 

0.309 

0.703 

0.949 

0.999 

0.184 

0.543 

0.868 

0.124 

0.304 

0.015 

GDL 

Tmo 

0.007 

0.022 

0.046 

0.023 

0.006 

0.077 

0.549 

0.937 

0.999 

0.167 

0.588 

0.945 

0.169 

0.547 

0.077 

Tmc 

0.048 

0.057 

0.051 

0.055 

0.047 

0.260 

0.708 

0.955 

1.000 

0.198 

0.602 

0.960 

0.204 

0.705 

0.253 

Tml 

0.074 

0.064 

0.052 

0.063 

0.076 

0.319 

0.721 

0.956 

1.000 

0.200 

0.602 

0.960 

0.205 

0.720 

0.314 

Table  7.  Power  and  type  I  error  at  Rr^  (alpha  =  0.05,  n  =  30).  For  each  RAR  methods,  the  results 
of  the  following  5  test  statistics  are  shown:  Agresti’s  correction  to  Wald-type  Z  test  TMW , 


log-relative-risk  test  TRisk,  Gart’s  correction  to  log-odds-ratio  test  TMo,  Cook’s  correction  to 
_ Chi-square  test  Tmc,  and  Williams’  correction  log-likelihood-ratio  test  Tml • _ 


Pi 

P2 

0.200 

0.200 

0.300 

0.300 

0.500 

0.500 

0.700 

0.700 

0.800 

0.800 

0.100 

0.300 

0.100 

0.500 

0.100 

0.700 

0.100 

0.900 

0.300 

0.500 

0.300 

0.700 

0.300 

0.900 

0.500 

0.700 

0.500 

0.900 

0.700 

0.900 

Tmw 

0.024 

0.045 

0.061 

0.051 

0.041 

0.156 

0.615 

0.923 

0.990 

0.185 

0.560 

0.898 

0.189 

0.611 

0.214 

TRisk 

0.136 

0.105 

0.078 

0.061 

0.050 

0.363 

0.716 

0.945 

0.997 

0.230 

0.588 

0.923 

0.206 

0.612 

0.210 

SMLE 

Tmo 

0.002 

0.008 

0.032 

0.039 

0.040 

0.022 

0.278 

0.792 

0.988 

0.096 

0.466 

0.903 

0.157 

0.615 

0.220 

Tmc 

0.033 

0.047 

0.060 

0.064 

0.068 

0.177 

0.615 

0.923 

0.996 

0.183 

0.570 

0.939 

0.202 

0.701 

0.316 

Tml 

0.069 

0.071 

0.061 

0.049 

0.051 

0.278 

0.659 

0.921 

0.975 

0.195 

0.543 

0.883 

0.179 

0.621 

0.253 

Tmw 

0.018 

0.046 

0.072 

0.054 

0.042 

0.134 

0.617 

0.931 

0.993 

0.198 

0.565 

0.896 

0.199 

0.586 

0.207 

Tr^ 

0.166 

0.123 

0.091 

0.066 

0.062 

0.402 

0.744 

0.951 

0.998 

0.253 

0.606 

0.926 

0.225 

0.649 

0.243 

DBCD 

Tmo 

0.001 

0.003 

0.030 

0.046 

0.049 

0.004 

0.164 

0.746 

0.994 

0.074 

0.457 

0.904 

0.158 

0.623 

0.248 

Tmc 

0.023 

0.047 

0.070 

0.068 

0.077 

0.148 

0.612 

0.928 

0.998 

0.193 

0.575 

0.940 

0.218 

0.707 

0.327 

Tml 

0.071 

0.083 

0.071 

0.050 

0.050 

0.278 

0.665 

0.928 

0.979 

0.207 

0.549 

0.880 

0.184 

0.596 

0.240 

Tmw 

0.026 

0.039 

0.045 

0.043 

0.032 

0.172 

0.598 

0.903 

0.988 

0.178 

0.537 

0.888 

0.183 

0.606 

0.198 

Tr^ 

0.105 

0.092 

0.075 

0.059 

0.049 

0.307 

0.686 

0.935 

0.996 

0.201 

0.546 

0.903 

0.186 

0.581 

0.193 

SEU 

Tmo 

0.009 

0.018 

0.029 

0.027 

0.023 

0.062 

0.372 

0.794 

0.986 

0.121 

0.468 

0.887 

0.146 

0.582 

0.176 

Tmc 

0.041 

0.044 

0.050 

0.064 

0.070 

0.209 

0.605 

0.903 

0.994 

0.178 

0.542 

0.922 

0.194 

0.681 

0.289 

Tml 

0.057 

0.052 

0.047 

0.049 

0.048 

0.266 

0.640 

0.900 

0.981 

0.183 

0.526 

0.879 

0.178 

0.624 

0.245 

Tmw 

0.023 

0.043 

0.059 

0.047 

0.038 

0.168 

0.617 

0.929 

0.993 

0.182 

0.558 

0.902 

0.196 

0.580 

0.195 

TRisk 

0.113 

0.092 

0.076 

0.062 

0.053 

0.347 

0.720 

0.950 

0.998 

0.227 

0.593 

0.928 

0.220 

0.617 

0.213 

GDL 

Tmo 

0.001 

0.006 

0.031 

0.040 

0.042 

0.016 

0.283 

0.831 

0.994 

0.094 

0.473 

0.908 

0.161 

0.604 

0.220 

Tmc 

0.030 

0.047 

0.058 

0.064 

0.070 

0.194 

0.618 

0.928 

0.998 

0.180 

0.567 

0.943 

0.214 

0.696 

0.311 

Tml 

0.077 

0.068 

0.058 

0.044 

0.045 

0.292 

0.653 

0.927 

0.990 

0.189 

0.540 

0.901 

0.182 

0.606 

0.236 
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Table  8.  Power  and  type  I  error  at  Rodds  (alpha  =  0.05,  n  =  30).  For  each  RAR  methods,  the 
results  of  the  following  5  test  statistics  are  shown:  Agresti’s  correction  to  Wald-type  Z  test  Tmw, 
log-relative-risk  test  TRisk,  Gart’s  correction  to  log-odds-ratio  test  Tmo,  Cook’s  correction  to 
_ Chi-square  test  Tmc,  and  Williams’  correction  log-likelihood-ratio  test  7mz,- _ 


P\ 

0.200 

0.300 

0.500 

0.700 

0.800 

0.100 

0.100 

0.100 

0.100 

0.300 

0.300 

0.300 

0.500 

0.500 

0.700 

P2 

0.200 

0.300 

0.500 

0.700 

0.800 

0.300 

0.500 

0.700 

0.900 

0.500 

0.700 

0.900 

0.700 

0.900 

0.900 

Tmw 

0.030 

0.040 

0.042 

0.040 

0.031 

0.202 

0.630 

0.935 

0.998 

0.178 

0.562 

0.939 

0.174 

0.637 

0.205 

Tr^ 

0.022 

0.023 

0.030 

0.026 

0.017 

0.143 

0.502 

0.857 

0.984 

0.128 

0.475 

0.884 

0.129 

0.497 

0.112 

SMLE 

Tmo 

0.024 

0.031 

0.036 

0.031 

0.023 

0.163 

0.587 

0.926 

0.999 

0.154 

0.536 

0.929 

0.151 

0.598 

0.167 

Tmc 

0.053 

0.048 

0.043 

0.047 

0.052 

0.283 

0.682 

0.946 

0.999 

0.184 

0.566 

0.947 

0.180 

0.690 

0.285 

Tml 

0.048 

0.045 

0.040 

0.044 

0.049 

0.266 

0.662 

0.938 

0.998 

0.174 

0.551 

0.941 

0.171 

0.672 

0.270 

Tmw 

0.029 

0.040 

0.044 

0.040 

0.028 

0.191 

0.632 

0.940 

0.999 

0.180 

0.572 

0.941 

0.178 

0.644 

0.198 

TRisk 

0.011 

0.018 

0.032 

0.026 

0.018 

0.085 

0.448 

0.864 

0.994 

0.120 

0.490 

0.906 

0.141 

0.547 

0.134 

DBCD 

Tmo 

0.026 

0.033 

0.042 

0.031 

0.024 

0.178 

0.609 

0.934 

0.999 

0.165 

0.555 

0.933 

0.161 

0.619 

0.185 

Tmc 

0.052 

0.046 

0.045 

0.046 

0.048 

0.280 

0.688 

0.948 

0.999 

0.185 

0.573 

0.949 

0.181 

0.696 

0.284 

Tml 

0.040 

0.043 

0.043 

0.043 

0.038 

0.244 

0.667 

0.945 

0.999 

0.178 

0.565 

0.944 

0.174 

0.680 

0.252 

Tmw 

0.032 

0.041 

0.043 

0.037 

0.030 

0.207 

0.647 

0.935 

0.996 

0.183 

0.562 

0.924 

0.186 

0.636 

0.204 

Tr^ 

0.047 

0.040 

0.035 

0.032 

0.028 

0.214 

0.605 

0.903 

0.993 

0.152 

0.503 

0.894 

0.140 

0.528 

0.146 

SEU 

Tmo 

0.014 

0.026 

0.032 

0.023 

0.020 

0.127 

0.540 

0.900 

0.995 

0.148 

0.520 

0.914 

0.150 

0.587 

0.159 

Tmc 

0.049 

0.047 

0.043 

0.047 

0.052 

0.268 

0.676 

0.938 

0.998 

0.187 

0.564 

0.945 

0.191 

0.695 

0.284 

Tml 

0.059 

0.049 

0.042 

0.044 

0.049 

0.285 

0.677 

0.935 

0.995 

0.182 

0.551 

0.922 

0.183 

0.665 

0.268 

Tmw 

0.029 

0.037 

0.049 

0.041 

0.030 

0.203 

0.657 

0.943 

0.999 

0.167 

0.573 

0.929 

0.178 

0.617 

0.192 

Tr^ 

0.024 

0.032 

0.046 

0.035 

0.031 

0.183 

0.625 

0.936 

0.999 

0.158 

0.560 

0.922 

0.165 

0.583 

0.166 

GDL 

Tmo 

0.013 

0.026 

0.043 

0.034 

0.033 

0.124 

0.587 

0.930 

0.999 

0.150 

0.552 

0.928 

0.161 

0.619 

0.204 

Tmc 

0.051 

0.047 

0.050 

0.050 

0.058 

0.281 

0.700 

0.948 

0.999 

0.177 

0.579 

0.949 

0.187 

0.695 

0.298 

Tml 

0.050 

0.047 

0.046 

0.039 

0.043 

0.282 

0.700 

0.947 

0.999 

0.176 

0.563 

0.933 

0.169 

0.652 

0.258 

Table  9.  Power  and  type  I  error  at  Rllr  (alpha  =  0.05,  n  =  30).  For  each  RAR  methods,  the  results 
of  the  following  5  test  statistics  are  shown:  Agresti’s  correction  to  Wald-type  Z  test  TMw , 
log-relative-risk  test  TRisk,  Gart’s  correction  to  log-odds-ratio  test  TMo,  Cook’s  correction  to 


Chi-square  test  Tmc,  and  Williams’  correction  log-likelihood-ratio  test  Tml- 


P\ 

0.200 

0.300 

0.500 

0.700 

0.800 

0.100 

0.100 

0.100 

0.100 

0.300 

0.300 

0.300 

0.500 

0.500 

0.700 

P2 

0.200 

0.300 

0.500 

0.700 

0.800 

0.300 

0.500 

0.700 

0.900 

0.500 

0.700 

0.900 

0.700 

0.900 

0.900 

Tmw 

0.034 

0.043 

0.046 

0.044 

0.031 

0.212 

0.659 

0.946 

0.999 

0.187 

0.575 

0.948 

0.182 

0.667 

0.218 

Tr^ 

0.039 

0.034 

0.033 

0.022 

0.008 

0.203 

0.597 

0.911 

0.995 

0.146 

0.490 

0.869 

0.124 

0.432 

0.072 

SMLE 

Tmo 

0.018 

0.029 

0.040 

0.031 

0.017 

0.129 

0.577 

0.931 

0.999 

0.162 

0.549 

0.934 

0.156 

0.587 

0.133 

Tmc 

0.052 

0.050 

0.046 

0.052 

0.051 

0.274 

0.692 

0.951 

0.999 

0.192 

0.578 

0.953 

0.185 

0.700 

0.278 

Tml 

0.060 

0.050 

0.044 

0.051 

0.057 

0.289 

0.691 

0.948 

0.999 

0.186 

0.567 

0.950 

0.181 

0.698 

0.289 

Tmw 

0.036 

0.047 

0.050 

0.045 

0.031 

0.223 

0.688 

0.957 

0.999 

0.192 

0.591 

0.956 

0.192 

0.697 

0.225 

Tr^ 

0.063 

0.049 

0.037 

0.012 

0.001 

0.278 

0.686 

0.947 

0.998 

0.171 

0.528 

0.872 

0.129 

0.356 

0.026 

DBCD 

Tmo 

0.010 

0.028 

0.046 

0.026 

0.009 

0.094 

0.569 

0.946 

0.999 

0.169 

0.579 

0.942 

0.171 

0.580 

0.094 

Tmc 

0.050 

0.055 

0.051 

0.052 

0.044 

0.265 

0.710 

0.959 

0.999 

0.197 

0.592 

0.959 

0.197 

0.715 

0.267 

Tml 

0.071 

0.062 

0.051 

0.057 

0.066 

0.315 

0.727 

0.960 

0.999 

0.198 

0.591 

0.959 

0.199 

0.733 

0.316 

Tmw 

0.034 

0.043 

0.046 

0.043 

0.033 

0.215 

0.665 

0.947 

0.999 

0.187 

0.581 

0.947 

0.186 

0.671 

0.214 

Tr^ 

0.047 

0.038 

0.031 

0.018 

0.007 

0.226 

0.617 

0.915 

0.995 

0.148 

0.492 

0.854 

0.125 

0.414 

0.063 

SEU 

Tmo 

0.016 

0.027 

0.038 

0.028 

0.013 

0.124 

0.573 

0.931 

0.999 

0.161 

0.553 

0.929 

0.157 

0.574 

0.123 

Tmc 

0.052 

0.049 

0.047 

0.050 

0.050 

0.276 

0.696 

0.952 

0.999 

0.191 

0.583 

0.951 

0.191 

0.701 

0.270 

Tml 

0.063 

0.051 

0.044 

0.052 

0.061 

0.294 

0.696 

0.949 

0.999 

0.186 

0.573 

0.948 

0.186 

0.701 

0.292 

Tmw 

0.033 

0.037 

0.043 

0.038 

0.032 

0.230 

0.670 

0.950 

1.000 

0.178 

0.585 

0.956 

0.177 

0.675 

0.215 

Tr^ 

0.035 

0.032 

0.036 

0.018 

0.005 

0.230 

0.645 

0.937 

0.999 

0.151 

0.537 

0.905 

0.139 

0.449 

0.049 

GDL 

Tmo 

0.016 

0.030 

0.043 

0.031 

0.014 

0.139 

0.614 

0.945 

1.000 

0.172 

0.582 

0.951 

0.172 

0.612 

0.127 

Tmc 

0.052 

0.050 

0.044 

0.048 

0.053 

0.293 

0.719 

0.955 

1.000 

0.189 

0.588 

0.960 

0.186 

0.722 

0.275 

Tml 

0.063 

0.051 

0.044 

0.049 

0.064 

0.322 

0.722 

0.955 

1.000 

0.189 

0.587 

0.960 

0.187 

0.728 

0.302 
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Table  10.  Power  and  type  I  error  at  Rrsihr  (alpha  =  0.05,  n  =  30).  For  each  RAR  methods,  the 
results  of  the  following  5  test  statistics  are  shown:  Agresti’s  correction  to  Wald-type  Z  test  Tmw, 
log-relative-risk  test  TRisk,  Gart’s  correction  to  log-odds-ratio  test  Tmo,  Cook’s  correction  to 


Chi-square  test  Tmc,  and  Williams’  correction  log-likelihood-ratio  test  Tml- 


Pi 

0.200 

0.300 

0.500 

0.700 

0.800 

0.100 

0.100 

0.100 

0.100 

0.300 

0.300 

0.300 

0.500 

0.500 

0.700 

P2 

0.200 

0.300 

0.500 

0.700 

0.800 

0.300 

0.500 

0.700 

0.900 

0.500 

0.700 

0.900 

0.700 

0.900 

0.900 

Tmw 

0.028 

0.045 

0.056 

0.048 

0.035 

0.174 

0.648 

0.944 

0.999 

0.192 

0.588 

0.946 

0.202 

0.678 

0.228 

TRisk 

0.118 

0.085 

0.058 

0.034 

0.018 

0.343 

0.712 

0.950 

0.999 

0.207 

0.568 

0.910 

0.172 

0.515 

0.102 

SMLE 

Tmo 

0.004 

0.012 

0.040 

0.034 

0.023 

0.037 

0.397 

0.890 

0.998 

0.130 

0.538 

0.936 

0.170 

0.616 

0.156 

Tmc 

0.038 

0.049 

0.056 

0.057 

0.057 

0.200 

0.657 

0.945 

0.999 

0.192 

0.591 

0.953 

0.208 

0.718 

0.290 

Tml 

0.070 

0.065 

0.056 

0.054 

0.062 

0.291 

0.685 

0.945 

0.998 

0.196 

0.579 

0.946 

0.197 

0.705 

0.301 

Tmw 

0.020 

0.050 

0.057 

0.050 

0.038 

0.157 

0.654 

0.948 

0.999 

0.201 

0.605 

0.956 

0.217 

0.700 

0.242 

Trm 

0.138 

0.103 

0.062 

0.030 

0.013 

0.383 

0.732 

0.953 

0.999 

0.227 

0.594 

0.922 

0.186 

0.534 

0.097 

DBCD 

Tmo 

0.001 

0.007 

0.038 

0.034 

0.020 

0.017 

0.323 

0.887 

0.999 

0.123 

0.554 

0.942 

0.185 

0.628 

0.159 

Tmc 

0.028 

0.056 

0.057 

0.057 

0.060 

0.183 

0.662 

0.948 

0.999 

0.202 

0.607 

0.959 

0.221 

0.733 

0.304 

Tml 

0.074 

0.079 

0.057 

0.052 

0.064 

0.293 

0.693 

0.948 

0.999 

0.208 

0.593 

0.954 

0.207 

0.726 

0.317 

Tmw 

0.029 

0.039 

0.050 

0.044 

0.033 

0.181 

0.626 

0.930 

0.998 

0.178 

0.559 

0.932 

0.182 

0.653 

0.214 

Tr^ 

0.095 

0.070 

0.044 

0.024 

0.010 

0.275 

0.650 

0.926 

0.996 

0.163 

0.512 

0.875 

0.137 

0.449 

0.071 

SEU 

Tmo 

0.014 

0.021 

0.037 

0.028 

0.016 

0.075 

0.466 

0.892 

0.997 

0.137 

0.521 

0.921 

0.152 

0.574 

0.128 

Tmc 

0.044 

0.045 

0.050 

0.053 

0.049 

0.225 

0.642 

0.932 

0.998 

0.181 

0.562 

0.945 

0.189 

0.696 

0.271 

Tml 

0.058 

0.053 

0.050 

0.052 

0.062 

0.268 

0.657 

0.929 

0.997 

0.178 

0.548 

0.934 

0.179 

0.684 

0.289 

Tmw 

0.031 

0.048 

0.052 

0.050 

0.036 

0.206 

0.682 

0.951 

1.000 

0.197 

0.610 

0.961 

0.212 

0.690 

0.235 

Tr^ 

0.084 

0.065 

0.050 

0.026 

0.009 

0.321 

0.715 

0.952 

1.000 

0.201 

0.591 

0.919 

0.173 

0.495 

0.076 

GDL 

Tmo 

0.002 

0.016 

0.042 

0.034 

0.017 

0.047 

0.476 

0.923 

1.000 

0.147 

0.577 

0.947 

0.186 

0.613 

0.142 

Tmc 

0.040 

0.052 

0.052 

0.056 

0.053 

0.228 

0.689 

0.952 

1.000 

0.198 

0.611 

0.964 

0.216 

0.721 

0.289 

Tml 

0.074 

0.062 

0.051 

0.055 

0.063 

0.301 

0.707 

0.952 

1.000 

0.199 

0.602 

0.962 

0.207 

0.722 

0.316 
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Table  1 1 .  Power  and  type  I  error  at  Rrpw  (alpha  =  0.05,  n  =  30).  For  each  RAR  methods,  the 
results  of  the  following  5  test  statistics  are  shown:  Agresti’s  correction  to  Wald-type  Z  test  Tmw, 
log-relative-risk  test  TRisk,  Gart’s  correction  to  log-odds-ratio  test  Tmo,  Cook’s  correction  to 


Chi-square  test  Tmc,  and  Williams’  correction  log-likelihood-ratio  test  Tml- 


Pi 

0.200 

0.300 

0.500 

0.700 

0.800 

0.100 

0.100 

0.100 

0.100 

0.300 

0.300 

0.300 

0.500 

0.500 

0.700 

Pi 

0.200 

0.300 

0.500 

0.700 

0.800 

0.300 

0.500 

0.700 

0.900 

0.500 

0.700 

0.900 

0.700 

0.900 

0.900 

Tmw 

0.031 

0.039 

0.050 

0.050 

0.042 

0.191 

0.631 

0.918 

0.966 

0.166 

0.538 

0.859 

0.183 

0.585 

0.204 

TRisk 

0.071 

0.058 

0.059 

0.061 

0.060 

0.287 

0.683 

0.939 

0.993 

0.193 

0.565 

0.905 

0.197 

0.607 

0.216 

RPW 

Tmo 

0.004 

0.012 

0.032 

0.038 

0.039 

0.047 

0.410 

0.840 

0.967 

0.105 

0.467 

0.867 

0.151 

0.584 

0.196 

Tmc 

0.045 

0.042 

0.050 

0.063 

0.075 

0.227 

0.640 

0.921 

0.988 

0.167 

0.546 

0.914 

0.196 

0.680 

0.301 

Tml 

0.067 

0.050 

0.049 

0.049 

0.053 

0.288 

0.661 

0.916 

0.931 

0.172 

0.523 

0.820 

0.173 

0.573 

0.235 

Tmw 

0.032 

0.043 

0.052 

0.050 

0.040 

0.208 

0.658 

0.944 

0.998 

0.183 

0.586 

0.939 

0.204 

0.658 

0.219 

Tr^ 

0.057 

0.051 

0.055 

0.048 

0.032 

0.273 

0.679 

0.947 

0.998 

0.192 

0.588 

0.935 

0.199 

0.612 

0.164 

DL 

Tmo 

0.003 

0.013 

0.038 

0.041 

0.033 

0.047 

0.464 

0.906 

0.998 

0.123 

0.527 

0.934 

0.172 

0.641 

0.193 

Tmc 

0.043 

0.045 

0.052 

0.062 

0.064 

0.237 

0.662 

0.944 

0.999 

0.184 

0.592 

0.956 

0.216 

0.723 

0.307 

Tml 

0.058 

0.050 

0.050 

0.049 

0.056 

0.275 

0.672 

0.943 

0.998 

0.183 

0.567 

0.940 

0.188 

0.688 

0.283 

Tmw 

0.027 

0.040 

0.048 

0.049 

0.044 

0.188 

0.626 

0.921 

0.968 

0.167 

0.537 

0.848 

0.175 

0.550 

0.195 

Tr^ 

0.073 

0.062 

0.058 

0.063 

0.072 

0.283 

0.678 

0.936 

0.993 

0.193 

0.563 

0.910 

0.196 

0.617 

0.247 

SMLE 

Tmo 

0.006 

0.012 

0.031 

0.040 

0.049 

0.054 

0.409 

0.840 

0.969 

0.108 

0.463 

0.864 

0.148 

0.584 

0.229 

Tmc 

0.039 

0.044 

0.049 

0.061 

0.079 

0.226 

0.636 

0.922 

0.989 

0.168 

0.547 

0.911 

0.190 

0.671 

0.315 

Tml 

0.064 

0.054 

0.046 

0.046 

0.047 

0.287 

0.659 

0.917 

0.925 

0.171 

0.519 

0.794 

0.165 

0.528 

0.200 

Tmw 

0.031 

0.037 

0.053 

0.049 

0.044 

0.202 

0.635 

0.929 

0.969 

0.181 

0.529 

0.813 

0.173 

0.503 

0.192 

TRisk 

0.063 

0.054 

0.065 

0.072 

0.081 

0.290 

0.685 

0.942 

0.994 

0.202 

0.572 

0.911 

0.209 

0.640 

0.285 

DBCD 

Tmo 

0.003 

0.010 

0.033 

0.043 

0.054 

0.041 

0.407 

0.866 

0.981 

0.110 

0.460 

0.856 

0.146 

0.573 

0.257 

Tmc 

0.041 

0.040 

0.054 

0.067 

0.083 

0.236 

0.640 

0.930 

0.990 

0.181 

0.543 

0.905 

0.195 

0.660 

0.325 

Tml 

0.061 

0.048 

0.052 

0.042 

0.036 

0.289 

0.661 

0.925 

0.857 

0.183 

0.511 

0.696 

0.160 

0.407 

0.144 

Tmw 

0.033 

0.040 

0.047 

0.041 

0.032 

0.204 

0.633 

0.924 

0.994 

0.183 

0.553 

0.908 

0.185 

0.618 

0.199 

Tr^ 

0.076 

0.059 

0.058 

0.048 

0.043 

0.278 

0.664 

0.929 

0.996 

0.183 

0.529 

0.899 

0.170 

0.564 

0.182 

SEU 

Tmo 

0.012 

0.021 

0.028 

0.027 

0.024 

0.100 

0.467 

0.855 

0.993 

0.130 

0.493 

0.900 

0.143 

0.578 

0.169 

Tmc 

0.051 

0.047 

0.050 

0.059 

0.065 

0.251 

0.652 

0.925 

0.997 

0.186 

0.556 

0.933 

0.197 

0.686 

0.286 

Tml 

0.062 

0.051 

0.048 

0.047 

0.049 

0.293 

0.671 

0.923 

0.992 

0.185 

0.541 

0.904 

0.183 

0.642 

0.251 

Tmw 

0.032 

0.045 

0.049 

0.045 

0.032 

0.216 

0.658 

0.937 

0.998 

0.171 

0.576 

0.916 

0.192 

0.602 

0.196 

Tr^ 

0.056 

0.053 

0.053 

0.050 

0.042 

0.281 

0.681 

0.942 

0.998 

0.180 

0.586 

0.927 

0.196 

0.615 

0.197 

GDL 

Tmo 

0.004 

0.017 

0.036 

0.040 

0.037 

0.066 

0.480 

0.900 

0.998 

0.122 

0.525 

0.918 

0.165 

0.622 

0.219 

Tmc 

0.044 

0.049 

0.050 

0.058 

0.061 

0.250 

0.666 

0.939 

0.999 

0.173 

0.584 

0.948 

0.206 

0.700 

0.314 

Tml 

0.061 

0.054 

0.047 

0.044 

0.043 

0.294 

0.681 

0.937 

0.998 

0.175 

0.560 

0.920 

0.179 

0.639 

0.256 
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Table  12.  The  mean  and  standard  deviation  (in  parenthesis)  of  type  I  error  and  power  calculated 
by  averaging  simulation  results  over  the  5  null  cases  and  the  10  alternative  cases  of  simulation 
_ scenarios.  All  results  have  been  multiplied  by  100%  (alpha  =  0.05,  n  =  30). _ 


Type  I  error 

Power 

Tmw 

Trisk 

Tmo 

Tmc 

Tml 

Row 

Mean 

Tmw 

Trisk 

Tmo 

Tmc 

Tml 

Row 

Mean 

SMLE 

4.4(1. 1) 

4. 6(4.1) 

2.0(14) 

5. 0(0. 6) 

6. 8(0. 9) 

4. 6(2.4) 

56.6(34.1) 

48.6(35.2) 

48.5(36.8) 

57.6(33.4) 

59.4(31.9) 

54.2(33.2) 

DBCD 

4.3(1 .4) 

5. 1(5.1) 

1.7(17) 

4.8(12) 

7.2(0. 8) 

4. 6(2. 9) 

56.9(34.4) 

49.5(35.9) 

48.0(37.6) 

57.7(33.9) 

60.2(31.8) 

54.5(33.7) 

R-Wald 

SEU 

4. 0(0. 9) 

3.4(2. 4) 

2.3(12) 

4. 8(0.2) 

5. 6(0. 6) 

4.0(17) 

56.0(34.0) 

47.7(34.8) 

49.6(36.1) 

57.5(33.0) 

58.4(32.3) 

53.8(32.9) 

GDL 

4.4(0. 8) 

37(3.1) 

2.1(16) 

5.2(0.4) 

6.6(10) 

4.4(2.2) 

57.3(34.0) 

50.0(36.2) 

50.6(36.9) 

58.4(33.2) 

60.0(32.0) 

55.3(33.3) 

Mean 

4.3(1 .0) 

4.2(3. 6) 

2.0(14) 

5.0(07) 

6.5(10) 

4.4(2. 3) 

56.7(32.8) 

49.0(34.2) 

49.2(35.4) 

57.8(32.1) 

59.5(30.7) 

54.4(33.0) 

SMLE 

4.4(1 .4) 

8. 6(3.5) 

2.4(18) 

5.5(14) 

6.0(10) 

5.4(2. 8) 

53.4(33.2) 

57.9(31.5) 

45.4(35.2) 

56.2(32.7) 

55.1(31.1) 

53.6(31.7) 

DBCD 

4. 6(2.0) 

10.2(4.4) 

2. 6(2. 3) 

57(2.2) 

6.5(14) 

5.9(3. 5) 

53.3(33.4) 

60.0(30.5) 

43.7(36.0) 

56.5(32.9) 

55.0(31.1) 

53.7(31.9) 

RRisk 

SEU 

37(0.8) 

7. 6(2. 3) 

2.1  (0.8) 

5.4(13) 

5.1  (0.4) 

4. 8(2.2) 

52.5(32.8) 

55.3(32.2) 

45.9(34.1) 

55.2(32.1) 

54.2(31.2) 

52.6(31.3) 

GDL 

4.2(1 .3) 

7. 9(2.4) 

2.4(19) 

5.4(16) 

5.8(14) 

5. 1(2.5) 

53.2(33.3) 

58.1(31.6) 

45.8(35.8) 

56.5(32.6) 

55.2(31.7) 

53.8(31.9) 

Mean 

4.2(1 .3) 

8. 6(3.1) 

2.4(17) 

5.5(15) 

5.9(12) 

5.3(2. 8) 

53.1(31.9) 

57.8(30.3) 

45.2(33.9) 

56.1(31.3) 

54.9(30.1) 

53.4(31.5) 

SMLE 

37(0.6) 

2.4(0. 5) 

2. 9(0. 5) 

4. 8(0.4) 

4. 5(0.4) 

3.7(10) 

54.6(33.9) 

47.1(34.3) 

52.1(34.9) 

57.6(32.6) 

56.4(32.9) 

53.6(32.5) 

DBCD 

3.6(07) 

2.1  (0.8) 

3.1(07) 

47(0.3) 

4.1  (0.2) 

3.5(11) 

54.8(34.2) 

47.3(35.2) 

53.4(34.5) 

57.8(32.7) 

56.5(33.4) 

53.9(32.8) 

R-Odds 

SEU 

3. 6(0. 5) 

3. 6(0. 8) 

2.3(07) 

47(0.3) 

4.9(07) 

3.8(11) 

54.8(33.5) 

50.8(33.8) 

50.4(34.8) 

57.5(32.5) 

56.6(32.2) 

54.0(32.1) 

GDL 

37(0.8) 

3.4(0. 8) 

3.0(11) 

5.1  (0.4) 

4. 5(0.4) 

3.9(10) 

54.6(34.2) 

53.0(34.6) 

52.5(35.0) 

58.1(32.7) 

56.8(33.0) 

55.0(32.5) 

Mean 

37(0.6) 

2. 9(0. 9) 

2. 8(0. 8) 

4. 9(0.4) 

4. 5(0. 5) 

3.7(11) 

54.7(32.6) 

49.5(33.2) 

52.1(33.4) 

57.8(31.4) 

56.6(31.6) 

54.1(32.3) 

SMLE 

4. 0(0. 6) 

27(1.2) 

2.7(10) 

5. 0(0.2) 

5.2(0. 6) 

3.9(13) 

55.9(33.9) 

48.4(35.0) 

51.6(35.6) 

58.0(32.8) 

58.0(32.6) 

54.4(32.8) 

DBCD 

4.2(0. 8) 

3. 3(2. 6) 

2.4(15) 

5. 0(0.4) 

6.1  (0.8) 

4.2(19) 

57.2(34.0) 

49.9(35.9) 

51.4(36.6) 

58.6(33.1) 

60.0(32.2) 

55.4(33.2) 

Rllr 

SEU 

4. 0(0. 6) 

2.8(16) 

2.4(10) 

4. 9(0.2) 

5.4(0. 8) 

3.9(15) 

56.1(33.9) 

48.5(34.8) 

51.2(35.7) 

58.1(32.8) 

58.2(32.5) 

54.4(32.8) 

GDL 

37(0.5) 

2.5(13) 

2.7(12) 

4. 9(0.4) 

5.4(0. 9) 

3.8(15) 

56.4(34.1) 

50.4(35.8) 

53.1(35.9) 

58.9(33.1) 

59.5(32.5) 

55.7(33.1) 

Mean 

3. 9(0. 6) 

2.8(16) 

2.5(11) 

5. 0(0. 3) 

5. 6(0. 8) 

4.0(15) 

56.4(32.6) 

49.3(34.0) 

51.8(34.6) 

58.4(31.7) 

58.9(31.2) 

55.0(32.7) 

SMLE 

4.2(1. 1) 

6.2(4. 0) 

2.3(15) 

5.2(0. 8) 

6.1(07) 

4. 8(2.4) 

56.0(33.9) 

54.8(33.7) 

48.7(36.4) 

57.5(33.2) 

58.4(32.0) 

55.1(32.6) 

DBCD 

4.3(1 .5) 

6. 9(5.2) 

2.0(16) 

5.2(13) 

6.5(11) 

5.0(3. 0) 

56.8(34.0) 

56.3(33.4) 

48.2(37.0) 

58.2(33.2) 

59.4(31.8) 

55.7(32.8) 

Rrsihr 

SEU 

3. 9(0. 8) 

4. 8(3.4) 

2.3(10) 

4. 8(0.4) 

5. 5(0. 5) 

4.3(19) 

54.5(33.8) 

50.5(34.5) 

48.6(35.8) 

56.4(33.0) 

56.6(32.4) 

53.3(32.7) 

GDL 

4. 3(0. 9) 

47(3.0) 

2.2(16) 

5. 1(0. 6) 

6.1  (0.9) 

4. 5(2.0) 

57.4(33.7) 

54.4(34.5) 

50.6(36.6) 

58.7(33.0) 

59.7(32.1) 

56.2(32.8) 

Mean 

4.2(1 .0) 

57(3.8) 

2.2(13) 

5.1  (0.8) 

6.1  (0.8) 

4. 6(2. 3) 

56.2(32.6) 

54.0(32.8) 

49.0(35.0) 

57.7(31.8) 

58.5(30.8) 

55.1(32.5) 

RPW 

4.2(0. 8) 

6.2(0. 5) 

2.5(16) 

5.5(14) 

5.4(0. 8) 

4.8(17) 

52.4(32.3) 

55.9(32.1) 

46.3(34.1) 

55.8(32.1) 

52.9(30.1) 

52.7(31.0) 

DL 

4. 3(0. 8) 

4.8(10) 

2.6(17) 

5. 3(0. 9) 

5. 3(0.4) 

4.5(14) 

56.0(33.5) 

55.9(33.4) 

50.0(36.1) 

58.2(32.6) 

57.4(32.5) 

55.5(32.4) 

SMLE 

4.2(0. 9) 

6. 5(0. 6) 

2.8(18) 

5.4(16) 

5.1  (0.8) 

4.8(17) 

51.7(32.3) 

56.2(31.8) 

46.7(33.7) 

55.7(31.9) 

51.7(30.2) 

52.4(30.9) 

Rrpw 

DBCD 

4. 3(0. 9) 

6.7(10) 

2. 9(2.1) 

5.7(18) 

4.8(10) 

4.9(19) 

51.2(31.8) 

57.3(31.2) 

47.0(34.1) 

56.0(31.5) 

48.3(29.2) 

52.0(30.6) 

SEU 

3. 8(0. 6) 

5.7(13) 

2.2(0. 6) 

5.4(0. 8) 

5.1  (0.6) 

4.5(15) 

54.0(33.1) 

54.0(32.7) 

48.3(34.4) 

56.7(32.1) 

55.9(31.7) 

53.8(31.6) 

GDL 

4. 0(0. 8) 

5.1 (0.6) 

2.7(16) 

5.2(07) 

5. 0(0. 8) 

4.4(13) 

54.6(33.5) 

56.0(33.0) 

50.2(35.3) 

57.8(32.4) 

56.4(32.3) 

55.0(32.0) 

Mean 

4.1  (0.8) 

5.8(11) 

2.6(15) 

5.4(12) 

5.1(07) 

4.6(16) 

53.3(31.4) 

55.9(31.0) 

48.1(33.2) 

56.7(30.7) 

53.8(29.8) 

53.5(31.2) 

Equal  Allocation 

4. 0(0. 5) 

2.9(17) 

2.4(10) 

5. 0(0.2) 

5.6(0. 8) 

4.0(15) 

56.2(33.9) 

48.5(35.0) 

50.9(35.9) 

58.1(32.9) 

58.4(32.4) 

54.4(32.9) 
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1.  ABSTRACT 

Applying  the  Emax  model  in  the  context  of  the  Loewe  additivity  model,  we  analyze  the  data  from  the  combination  drug 
study  of  trimetrexate  (TMQ)  and  AG2034  (AG)  in  low  and  high  folic  acid  (FA)  media.  The  Emax  model  provides  a  sufficient  fit  to 
the  data.  TMQ  is  more  potent  than  AG  in  both  Low  FA  and  High  FA  experiments.  At  low  TMQ: AG  ratios  when  a  smaller 
amount  of  the  more  potent  drug  (TMQ)  is  added  to  a  larger  amount  of  the  less  potent  drug  (AG),  it  results  in  synergy.  However, 
when  the  TMQ: AG  ratio  reaches  0.4  or  larger  in  the  low  FA  medium,  or  when  the  TMQ: AG  ratio  reaches  1  or  larger  in  the  high 
FA  medium,  synergy  tends  to  be  weakened  and  the  mode  of  drug  interaction  becomes  additive.  In  general,  synergistic  effect  is 
stronger  at  higher  doses  which  produce  stronger  effects  (effect  closer  to  1  -Emax)  than  at  the  lower  dose  levels  which  produce 
weaker  effects  (effect  closer  to  1)  in  the  same  dilution  series.  The  two  drugs  are  more  potent  in  the  low  FA  medium  compared  to 
the  high  FA  medium.  The  drug  synergy,  however,  is  stronger  in  the  high  FA  medium. 


2.  INTRODUCTION 


Due  to  complex  disease  pathways,  combination  treatments  can  be  more  effective  and  less  toxic  than  treatments  with  a 
single  regimen.  Successful  applications  of  combination  therapy  have  improved  the  effectiveness  in  treating  many  diseases.  For 
example,  the  combination  of  a  non-nucleoside  reverse  transcriptase  inhibitor  or  protease  inhibitor  with  two  nucleosides  is 
considered  a  standard  front-line  therapy  in  AIDS.  Typically,  a  combination  of  three  to  four  drugs  is  required  to  provide  durable 
response  and  immune  reconstitution  (1).  Another  example  is  that  platinum-based  doublet  chemotherapy  regimens  are  now 
considered  to  be  the  standard  of  care  in  patients  with  advanced  stage  non-small  cell  lung  cancer  (2).  Combination  treatments  have 
also  been  shown  to  prevent  and  to  overcome  drug  resistance  in  infectious  diseases  such  as  malaria,  and  in  complex  diseases  such 
as  cancer  (3,  4).  The  advent  of  the  development  of  targeted  agents  has  also  spurred  much  development  in  seeking  effective 
therapies  for  cancer  by  combining  multiple  targeted  agents  with  or  without  chemotherapy,  or  combining  multiple  treatment 
modalities  such  as  the  combination  of  drug  treatment,  surgery,  and/or  radiation  therapy  (5,  6). 

“How  does  one  assess  the  effect  of  a  combination  therapy?”  It  is  a  simple  question.  Yet,  the  complexity  of  the  answer 
increases  as  one  analyzes  it  further.  This  first  answer  may  be  that  if  a  combination  therapy  shows  an  effect  that  is  greater  than  the 
effect  produced  by  each  single  component  given  alone,  the  combination  therapy  is  working.  The  notion  of  classifying  drug 
interaction  as  additive,  synergistic,  or  antagonistic  is  logical  and  easily  understandable  in  a  general  sense,  but  can  be  confusing 
without  a  specific  and  agreeable  definition.  Excellent  reviews  of  drug  synergisms  can  be  found  in  Berenbaum  (7),  Greco  et  al.  (8), 
Suhnel  (9),  Chou  (10),  and  Tallarida  (1 1),  to  name  a  few.  In  essence,  to  quantify  the  effect  of  combination  therapy,  one  must  first 
define  what  “additivity”  is.  If  the  combination  effect  is  more  (or  less)  than  the  additive  effect  of  the  single  agents,  then  it  is 
considered  synergistic  (or  antagonistic),  accordingly.  Furthermore,  due  to  the  stochastic  error  in  producing  the  effect  in  all 
experiments,  drug  interaction  should  also  be  assessed  in  a  statistical  sense.  A  more  rigorous  definition  requires  synergy  to  be 
defined  only  when  the  combined  drug  effect  is  statistically  significantly  higher  than  the  additive  effect.  Conversely,  antagonism 
is  observed  when  the  combination  effect  is  statistically  lower  than  the  additive  effect. 

Despite  controversies  and  multiple  definitions  of  additivity  or  no  drug  interaction,  the  Loewe  additivity  model  is 
commonly  accepted  as  the  gold  standard  for  quantifying  drug  interaction  (7-11).  The  Loewe  additivity  model  is  defined  as: 


d  ,  d  , 

- - —  +  - - 
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Here  y  is  the  predicted  additive  effect  at  the  combination  dose  (dly  d2)  when  the  two  drugs  do  not  interact.  Dy  l  and  Dy2  are  the 
respective  doses  of  drug  1  and  drug  2  required  to  produce  the  same  effect  y  when  used  alone.  Note  that  the  Loewe  additivity  can 
be  easily  demonstrated  in  a  “sham  combination”  (i.e.,  a  drug  is  combined  with  itself  or  its  diluted  form).  For  example,  suppose 
drug  2  is  a  50%  diluted  form  of  drug  1.  The  combination  of  one  unit  of  drug  1  and  one  unit  of  drug  2  will  produce  the  same 
effect  as  1.5  units  of  drug  1  or  3  units  of  drug  2.  Plugging  the  respective  values  in  equation  ( E  1),  we  have  1/1.5  +  1/3  =  1. 
Given  the  dose-effect  relationship  for  each  single  agent,  say  Ei(d)=fi(d)  for  agent  i  (i=  1,2),  Dy  i  can  be  obtained  by  using  the 
inverse  function  of/-,  say ,frl(y).  Replacing  Dy  l  and  Dy2  in  equation  ( E  1)  with  fi'2(y)  and  f2](y),  respectively,  we  can  rewrite 
equation  (E  1)  as 


(E  2) 


Note  that  (E  2)  involves  an  unknown  variable  y.  By  solving  equation  (E  2),  the  predicted  additive  effect  yadd  can  be  obtained 
under  the  Loewe  additivity  model.  Denote  that  the  observed  mean  effect  is  yobs  at  the  combination  dose  ( dh  d2).  The  drug 
combination  at  that  dose  is  considered  synergistic,  additive,  or  antagonistic  when  the  effect  yobs  is  greater  than,  equal  to,  or  less 
than  yadd ,  respectively.  When  the  dose-effect  curve  is  decreasing  (or  increasing),  a  synergistic  effect  corresponds  to  a  smaller  (or 
larger)  value  than  the  predicted  quantity. 

Alternatively,  to  measure  and  quantify  the  magnitude  of  drug  interaction,  the  interaction  index  {II)  can  be  defined  as: 


II 


(E  3) 


Note  that  II  <  1,  11=  1,  and//>l  correspond  to  the  drug  interaction  being  synergistic,  additive,  and  antagonistic,  respectively. 
Chou  and  Talalay’s  (12)  proposed  the  median  effect  equation  (E  4)  to  characterize  the  dose-effect  relationship  in  combination 
studies: 


E(d)=  (*'EDsty 
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where  ED50  is  the  dose  required  to  produce  50%  of  the  maximum  effect.  Although  the  median  effect  equation  can  be  applied  in 
many  settings,  it  assumes  that,  when  m  is  positive,  E(d)= 0  for  d=  0  and  E{d)=  1  for  d=o o  ,  respectively.  On  the  other  hand,  when  m 
is  negative,  E(d)=  1  for  d=  0  and  E(d)= 0  for  d=cc  ,  respectively.  If  we  assume  that  the  data  follow  the  median  effect  equation,  a 
linear  relationship  can  be  found  by  plotting  the  logit  transformation  of  the  effect  versus  the  logarithm  transformed  dose.  A  more 
detailed  account  of  the  interpretation  and  use  of  the  interaction  index  can  be  found  in  a  number  of  references  (13-16).  Several 
methods  for  constructing  the  confidence  interval  estimation  of  the  interaction  index  were  proposed  in  Lee  and  Kong  (17). 

To  help  advance  the  research  for  developing  and  comparing  methods  for  analyzing  data  for  combination  studies,  Dr. 
William  R.  Greco  at  the  Roswell  Park  Cancer  Institute  has  organized  an  effort  and  invited  several  groups  to  participate  in  an 
exercise  to  compare  rival  modem  approaches  to  model  data  from  two-agent  concentration-effect  studies.  We  describe  the  data 
and  the  statistical  methods  including  the  Emax  model  and  the  calculation  of  the  interaction  index  under  the  Emax  model  in  Section  3. 
Exploratory  data  analysis  is  shown  in  Section  4.  Data  preprocessing  for  outlier  rejection  and  standardization  are  described  in 
Section  5.  The  main  result  of  the  data  analysis  is  presented  in  Section  6  with  a  summary  given  in  Section  7.  Discussion  is 
provided  in  Section  8. 

3.  MATERIAL  AND  METHOD 

3.1  Data  Sets 

Two  data  sets  provided  by  Dr.  Greco  are  used  to  examine  the  effect  of  the  combination  treatment  of  trimetrexate 
(TMQ)  and  AG2034  (AG)  in  HCT-8  human  ileocecal  adenocarcinoma  cells.  The  cells  were  grown  in  medium  with  two  levels  of 
folic  acid:  2.3  pM  (the  first  data  set,  called  Low  FA)  and  78  pM  (the  second  data  set,  called  High  FA).  Trimetrexate  is  a 
lipophilic  inhibitor  of  the  enzyme,  dihydrofolate  reductase;  and  AG2034  is  an  inhibitor  of  the  enzyme,  glycinamide 
ribonucleotide  formyltransferase.  The  experiment  was  conducted  on  96-well  plates.  The  endpoint  was  the  cell  growth  measured 
by  an  absorbance  measurement  (ranges  from  0  to  2),  recorded  in  an  automated  96-well  plate  reader.  Each  96-well  plate  included 
8  wells  as  instmmental  blanks  (no  cells)  and  the  remaining  88  wells  were  used  for  dmg  treatments.  The  experiments  were 
performed  using  the  “ray  design,”  which  maintains  a  fixed  dose  ratio  between  TMQ  and  AG  in  a  serial  of  1 1  dose  dilutions.  With 
88  wells  in  each  plate,  each  5-plate  stack  studied  the  combination  doses  at  7  curves  (i.e.,  design  rays)  plus  a  “curve”  with  all 
controls.  Two  stacks  were  used  for  studying  14  design  rays  and  they  are:  TMQ  only,  AG  only,  and  twelve  other  design  rays  with 
a  fixed  dose  ratio  (TMQ: AG)  for  each  ray.  The  fixed  dose  ratios  in  the  Low  FA  experiment  are:  1:250,  1:125,  1:50,  1:20,  1:10, 

1:5  (2  sets),  2:5,  4:5,  2:1,  5:1,  and  10:1.  Similarly,  the  fixed  dose  ratios  in  the  High  FA  experiment  are:  1:2500,  1:1250,  1:500, 
1:200,  1:100,  1:50  (2  sets),  1:25,  2:25,  1:5,  1:2,  and  1:1.  Data  from  each  of  the  16  curves  (2  for  controls,  2  for  single  agents,  and 
12  for  combinations)  are  grouped  together.  Curves  1-8  were  performed  on  the  first  stack  with  Curve  8  serving  as  the  “control” 
experiment  while  Curve  9-16  were  performed  on  the  second  stack  with  Curve  16  serving  as  the  “control”  experiment.  Treatments 
of  cells  in  wells  by  different  drug  combinations  were  randomized  across  the  plates.  Five  replicate  plates  were  used  for  each  set  of 
two  stacks.  Therefore,  a  total  of  10  plates  were  used  for  each  of  the  two  medium  conditions  (Low  FA  and  High  FA).  The 
maximum  number  of  treated  wells  per  medium  condition  is  880  (16  curves  x  1 1  dilutions  x  5  replicates).  Complete  experimental 
details  and  mechanistic  implications  were  reported  in  Faessel  et  al.  (18). 


3.2  Statistical  Methods 


3.2.1:  Emax  model 

Due  to  the  fact  that  the  measure  of  cell  growth  plateaus  and  does  not  reach  zero  at  the  maximum  dose  levels  used  in  the 
experiments,  the  median  effect  equation  (E  4)  does  not  fit  the  data.  Instead,  we  take  the  Emax  model  (19)  to  fit  the  data  at  hand. 
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where  E0  is  the  base  effect,  corresponding  to  the  measurement  of  the  cell  growth  when  no  drug  is  applied;  Emax  is  the  maximum 
effect  attributable  to  the  drug;  ED50  is  the  dose  level  producing  half  of  Emax;  d  is  the  dose  level,  which  produces  the  effect  E{d ), 
and  m  is  a  slope  factor  (Hill  coefficient),  measuring  the  sensitivity  of  the  effect  within  a  dose  range  of  the  drug.  Thus,  E0  ~Emax  is 
the  asymptotic  effect  when  a  very  large  dose  of  the  drug  is  applied.  Figure  1  shows  a  few  examples  of  the  Emax  model  where  E0  is 
assumed  to  be  1 .  The  parameter  m  governs  how  quickly  the  curve  drops.  For  the  three  cases  in  the  first  row  in  Figure  1,  ED50  is 
fixed  at  2  and  Emax  is  at  0.8,  while  the  slope  varies.  When  m=  1  (Panel  A),  the  dose  response  curve  drops  slowly;  when  m= 5 
(Panels  B  and  E),  a  sigmoid  shape  curve  is  formed,  and  when  m= 20  (Panels  C  and  F),  the  drop  of  the  sigmoid  curve  becomes 
very  steep.  In  the  three  curves  in  the  first  row,  as  the  dose  increases,  the  curves  drop,  and  the  effect  asymptotes  to  1  -Emax  =  0.2. 
In  the  second  row,  the  three  plots  are  set  at  Emax  =  1 ,  which  means  that  as  the  dose  increases,  the  treatment  will  reach  the 
theoretical  full  effect.  For  example,  if  the  effect  measure  is  cell  count,  all  the  cells  will  be  killed  at  very  high  doses  of  the 


treatment  when  Emax  =  1 .  The  figures  also  show  that,  as  ED50  increases,  the  curves  are  shifted  to  the  right  indicating  that  the 
treatment  is  less  potent.  In  all  cases  when  m  increases,  the  effect  drops  more  rapidly.  We  apply  the  non-linear  weighted  least 
squares  method  to  estimate  the  parameters  in  the  Emax  model.  Due  to  the  heteroscedascity  observed  in  the  data,  that  the  variance 
increases  as  the  observed  response  increases,  we  use  the  reciprocal  of  the  fitted  response  as  the  weight  function  (20).  Estimation 
is  carried  out  using  S-PLUS,  R  (21),  and  SAS  (22). 


3.2.2:  Interaction  Index  under  the  Emax  Model 

Similarly  to  using  the  median  effect  model,  the  Emax  model  can  be  applied  to  fit  the  single-drug  and  combination  dose 
response  curves,  and  then,  the  interaction  index  can  be  calculated  accordingly.  Although  equation  (E  5)  allows  different  values  of 
E0  and  Emax  for  different  curves,  for  calculating  the  interaction  index,  we  need  to  assume  all  curves  have  the  same  E0  so  that  the 
“base  measure”  of  no  drug  effect  is  the  same  in  all  curves.  This  can  be  achieved  by  dividing  all  effect  measures  with  the  mean  of 
the  controls.  Note  that  Emax  can  remain  different  in  different  curves  to  signify  different  drug  potencies.  However,  the  calculation 
of  the  interaction  index  will  be  a  little  more  complicated  when  different  drugs  or  combinations  produce  different  Emax  s  as  will  be 
shown  later. 


From  this  point  on,  we  assume  the  dose  response  curve  follows  the  following  Emax  model 
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Our  experiments  study  the  effect  of  treatments  in  inhibiting  cell  growth.  The  effect  measure  is  cell  growth  corresponding  to  the 
amount  of  cells  observed.  Hence,  the  height  of  the  dose  effect  curve  decreases  when  the  dose  increases.  In  this  case,  we  have  m  > 
0.  In  addition,  as  d  goes  to  infinity,  the  effect  plateaus  at  1  -Emax.  Hence,  Emax  must  be  between  0  and  1. 

In  the  study  of  two  drug  combinations,  we  need  to  fit  three  curves  using  the  Emax  model:  curve  1  for  drug  1  alone,  curve 
2  for  drug  2  along,  and  curve  c  for  drug  combinations.  Denote  Emax>  t ,  ED50  h  and  mt  as  the  three  parameters  for  drug  i  (i=  1,2,  c). 
Given  an  effect  e  ( e>l-Emax ),  the  corresponding  dose  d(e )  can  be  calculated  as 
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Note  that  the  dose  for  the  combination  treatment  can  be  obtained  simply  as  the  sum  of  the  doses  of  the  single  agents.  This 
approach  works  well  for  the  ray  design  with  constant  or  varying  relative  potency  between  the  two  drugs  (12,  17).  Without  loss  of 
generality,  we  can  assume  that  EmaXt  2  >  Emax>  2.  In  addition,  we  assume  that  the  dose  ratio  for  the  two  drugs  in  the  combination 
treatment  (dc=d!+d2)  is  fixed  with  djl  d2  =p.  Upon  fitting  the  three  dose  response  curves,  the  interaction  index  at  a  fixed  effect  e 

where  e  e  (1  -  Emax>c ,  1)  can  be  calculated  as  the  following: 


j!  =  dAe)xp/(\  +  p)  +  dc(em  +  p){ml_k  ^ 

DJe)  DJe) 


jl  =  dc(e)y/(l  +  p>  forl_£  x_t 

7~y  /  \  max,  1  max,  2 

DyA(e) 


(E  8) 


For  e  <  1  -  Emaxl ,  the  interaction  index  cannot  be  calculated.  However,  the  combination  effect  in  this  range  is  more 

than  additive  because  it  reaches  to  the  effect  level  that  no  single  agent  alone  can  achieve.  If  Emax>  j  =  Emax>  2,  the  interaction  index 
can  be  calculated  using  the  first  formula  in  (E  8). 

3.2.3:  Confidence  Interval  for  the  Interaction  Index 

We  can  apply  the  delta  method  to  calculate  the  (large  sample)  variance  of  the  interaction  index  (23).  From  our  previous 
work  (17),  we  found  that  better  estimation  of  the  confidence  interval  for  the  interaction  index  can  be  achieved  by  working  on  the 


logarithmic  transformation  of  the  interaction  index. 

By  applying  the  delta  method,  Var(log(  II  ))  =  Var(  II ). 

II 


When  1  -  E  2<e<\  ,the  variance  of  II  can  be  calculated  by 


Var(  II  )  = 
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DJe)  y 
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dc(  e )  x  p/(  1  +  p )  +  dc(  e )/( 1  +  p ) 
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if  1  -  E  .  <  e  <  1  -  E  j ,  then 
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for  z=7,  2,  c. 


Upon  the  calculation  of  the  variance  for  /ag(  //  ),  the  point- wise  (l-a)100%  confidence  interval  for  II  for  a  specified  effect  can  be 
constructed  as 


II  exp  ( -  za/Jvar(log(  11)),  11  exp(za/JVar(log(II)) 


(E  11) 


where  za/2  is  the  upper  a/2  upper  percentile  of  the  standard  normal  distribution.  We  also  construct  the  simultaneous  confidence 
band  for  the  interaction  index  over  the  range  of  estimated  responses.  Because  the  estimation  process  involves  estimating  nine 
parameters  from  three  curves,  to  construct  a  Scheffe  type  of  simultaneous  confidence  band,  we  simply  replace  za/2  in  equation  (E 
1 1)  by  (x2P(a))1/2  where  p=9  (24). 


3.3  Data  Analysis  Plan 

The  overall  objective  of  the  data  analysis  is  to  assess  synergistic  effect  of  the  combination  of  TMQ  and  AG  in  both  low 
and  high  FA  media.  We  apply  the  exploratory  data  analysis  first  followed  by  estimating  the  dose-response  relationship  using  the 
Emax  model.  The  drug  interaction  is  evaluated  by  calculating  the  interaction  index  under  the  Loewe  additivity  model.  Exploratory 
data  analysis  is  performed  to  understand  the  data  structure,  patterns,  and  to  determine  whether  preprocessing  of  the  data  in  terms 
of  outlier  rejection  and  standardization  is  required  before  the  data  modeling.  The  Low  FA  and  High  FA  experiments  are  analyzed 
separately  then  compared.  For  each  experiment,  the  Emax  model  is  applied  to  fit  the  two  marginal  and  twelve  combination  dose 
response  curves.  The  interaction  index  and  its  95%  confidence  intervals  are  computed  for  each  of  the  twelve  combinations.  The 
overall  pattern  of  the  drug  interaction  is  assessed  by  examining  the  interaction  index  from  the  12  fixed-ratio  combinations 
together.  A  one-dimensional  distribution  plot  via  the  BLiP  plot  (25)  is  applied  to  display  the  data.  A  two-dimensional  scatter  plot, 
a  contour  plot,  and  an  image  plot  as  well  as  a  three-dimensional  perspective  plot  are  used  to  show  the  dose  response  relationship. 
A  Trellis  plot  (26)  is  also  applied  to  assemble  the  individual  plots  together  into  consecutive  panels  conditioning  on  different 
values  of  fixed  dose  ratios. 


4.  EXPLORATORY  DATA  ANALYSIS 

As  in  all  data  analyses,  we  begin  with  exploratory  data  analysis.  For  the  Low  and  High  FA  experiments,  there  are  871 
and  879  readings,  respectively.  Only  9  and  1  observations  are  missing  out  of  the  maximum  of  880  readings  in  each  experiment, 
respectively.  The  data  also  comes  with  designated  curve  numbers  ranging  from  1  to  16  and  data  point  numbers  ranging  from  1  to 
176.  Each  curve  number  indicates  a  specific  dose  combination.  We  re-label  the  curves  as  A-P  where  A  and  B  correspond  to  the 
control  (no  drug)  curves;  C  and  D  correspond  to  the  TMQ  and  AG  alone  curves,  and  curves  E  through  P  correspond  to  the 
combination  curves  with  fixed  dose  ratios  in  ascending  order.  Each  point  number  indicates  the  readings  at  each  specific  dilution 
of  each  curve.  Since  five  duplicated  experiments  were  performed,  there  are  up  to  five  readings  for  each  specific  point  number. 
There  is,  however,  no  designation  of  the  plate  number  in  the  data  received.  Figure  2  shows  the  variable  percentile  plot  of  the 
distribution  of  the  effect  from  the  Low  FA  and  High  FA  experiments  using  the  BLiP  plot  with  each  segment  corresponding  to  a 


five  percent  increment  (25).  The  plot  gives  an  overall  assessment  on  the  distribution  of  the  outcome  variable  of  cell  growth 
without  conditioning  on  experimental  settings.  The  middle  20%  of  the  data  (40th  to  60th  percentiles)  are  shaded  in  a  light  orange 
color.  This  figure  indicates  that  the  data  have  a  bimodal  distribution  with  most  data  cluttered  either  around  a  low  value  of  0.2  or 
at  a  high  value  of  1.2.  For  the  Low  FA  experiment,  the  distribution  of  the  effect  ranges  from  0.072  to  1.506  with  the  lower, 
middle,  and  upper  quartiles  being  0.149,  0.449,  and  1.150,  respectively.  Similarly,  for  the  High  FA  experiments,  the  effect  range 
is  between  0.070  and  1.545.  The  three  quartiles  are  0.213,  0.990,  and  1.1495,  respectively.  The  median  of  Low  FA  data  is  smaller 
than  the  median  of  the  High  FA  data.  The  bimodal  distributions  could  result  from  steep  dose  response  curves.  As  a  consequence, 
the  slope  may  not  be  able  to  be  estimated  well  in  certain  cases. 

To  help  understand  the  pattern  of  the  fixed  ratio  dose  assignment  in  a  ray  design  and  the  relationship  between  the  fixed 
ratio  doses  and  curve  number,  we  plot  the  logarithm  transformed  dose  of  TMQ  and  AG  in  Figure  3  for  both  the  Low  FA  and 
High  FA  experiments.  As  can  be  seen,  Curves  A  and  B  are  the  controls  with  no  drugs.  Curves  C  and  D  correspond  to  the  single 
drug  study  of  TMQ  and  AG,  respectively.  Curves  E  through  P  are  the  various  fixed  ratio  combination  doses  of  TMQ  and  AG. 
Note  that  Curves  J  and  K  have  the  same  dose  ratios.  Within  each  curve,  the  1 1  dilutions  are  marked  by  1 1  circles.  For  the 
combination  studies,  the  curves  for  different  dose  ratios  are  parallel  to  each  other  on  the  log  dose  scale.  If  the  same  plot  is  shown 
in  the  original  scale,  these  lines  will  form  “rays,”  radiating  out  from  the  origin  like  sun  rays.  Hence,  the  term  “ray  design”  is  used 
to  describe  this  type  of  experiment.  The  corresponding  dose  ranges  used  for  each  drug  alone  are:  5.47  X  10'6  to  0.56  pM  for  TMQ 
in  both  the  Low  FA  and  High  FA  experiments,  and  2.71  X  10'5  to  2.78  pM  for  AG2034  in  the  Low  FA  experiment  and  2.71  X  10'4 
to  27.78  pM  in  the  High  FA  experiment. 

Figures  4  and  5  show  the  raw  data  of  the  effect  versus  dose  level  by  curve  for  the  Low  FA  and  High  FA  experiments, 
respectively.  Instead  of  using  the  actual  dose,  we  plot  the  data  using  a  sequentially  assigned  dose  level  to  indicate  each  dilution 
within  each  curve  such  that  the  data  can  be  shown  clearly.  In  addition,  the  data  points  at  each  dilution  for  each  curve  are  coded 
from  1  to  5  according  to  the  order  of  the  appearance  in  the  data  set.  We  assume  that  these  numbers  correspond  to  the  replicate 
number  for  each  design  point  (well  position  in  the  stack  of  5  plates).  Because  the  plate  number  was  not  listed  in  the  data,  we  are 
not  certain  if  this  is  the  case.  From  the  plot,  one  can  see  that  there  are  outliers  in  several  dilution  series.  Notably,  in  Figure  4,  the 
effects  from  plate  (replicate)  #1  in  Curves  B,  E,  F,  and  K  tend  to  be  lower  than  all  other  replicates.  There  are  also  some  unusually 
large  values  seen,  for  example,  replicate  2  in  Curve  A,  dose  level  (dilution  series)  6;  replicate  3  in  Curve  L,  dose  level  4;  and 
replicate  2  in  Curve  M  dose  level  1.  Similarly,  for  the  High  FA  experiments,  plate  #1  seems  to  have  some  low  values  in  Curves 
B,  C,  H,  I,  and  J,  and  plate  #4  seems  to  have  some  low  values  in  Curves  E,  K,  N,  O,  and  P.  These  findings  indicate  that  certain 
procedures  need  to  be  performed  to  remove  the  obvious  outliers  in  order  to  improve  the  data  quality  before  the  data  analysis. 

Figure  6  shows  the  perspective  plot,  contour  plot,  and  image  plot  for  the  Low  FA  experiment.  From  the  perspective 
plots  in  Panels  A  (back  view),  B  (front  view),  and  C  (side  view),  we  can  see  that  the  effect  starts  at  a  high  plane  plateau  at  an 
effect  level  of  about  1.2  when  the  doses  of  TMQ  are  AG  are  small.  As  the  dose  of  each  drug  increases,  the  effect  remains  about 
constant  for  a  while,  then,  a  sudden  drop  occurs.  This  steep  downward  slope  can  be  found  by  taking  the  trajectory  of  any 
combination  of  the  TMQ  and  AG  doses,  which  is  evident  in  the  dose  response  curves  shown  in  Figures  4  and  5  as  well.  The  steep 
drop  of  the  effect  can  also  be  found  in  the  contour  plot  and  the  image  plot.  Similar  patterns  of  the  dose  response  relationship  are 
shown  in  Figure  7  for  the  High  FA  experiment  as  well.  The  drop  of  the  effect  occurs  at  smaller  doses  in  the  Low  FA  experiment 
and  at  larger  doses  in  the  High  FA  experiment. 

5.  DATA  PREPROCESSING:  OUTLIER  REJECTION  AND  DATA  STANDARDIZATION 
5.1  Outlier  Rejection 

To  address  the  concern  that  outliers  may  adversely  affect  the  analysis  outcome,  we  devised  the  following  simple  plan. 
For  each  of  the  176  point  numbers  (16  curves  x  1 1  dilutions),  the  five  effect  readings  should  be  close  to  each  other  because  they 
are  from  the  replicated  experiments.  However,  since  the  plate  number  was  not  in  the  data  set,  we  cannot  assess  the  plate  effect. 
Neither  can  we  reject  a  certain  replicate  plate  entirely  should  there  be  an  outlying  plate  nor  apply  a  mixed  effect  model  treating 
the  plate  effect  as  a  random  effect.  For  the  four  or  five  effect  readings  in  each  point  number  (only  9  point  numbers  in  the  Low  FA 
and  1  in  the  High  FA  experiments  have  4  readings),  we  compute  the  median  and  the  interquartile  range.  An  effect  reading  is 
considered  as  an  outlier  if  the  value  is  beyond  median  +  1.4529  times  the  interquartile  range.  If  the  data  are  normally  distributed 
(i.e.,  follow  a  Gaussian  distribution),  the  range  expands  to  cover  the  middle  95%  of  the  data.  Hence,  only  about  5%  of  the  data 
points  (2.5%  at  each  extreme)  are  considered  as  outliers.  The  number  1.4529  is  obtained  by  qnorm(.975)/(  qnorm(.75)  - 
qnorm(.25))  where  qnorm(x)  is  a  quantile  function  which  returns  the  vth  percentiles  from  a  normal  distribution.  Upon  applying  the 
above  rule,  129  out  of  871  (14.8%)  effect  readings  in  the  Low  FA  experiment  and  126  out  of  879  (14.3%)  of  the  High  FA 
experiment  are  considered  outliers  and  are  removed  before  proceeding  to  further  analysis.  The  numbers  of  outliers  in  replicates  1 
to  5  are  60,  28,  19,  14,  and  8  for  the  Low  FA  experiment  and  35,  18,  21,  34,  and  18  for  the  High  FA  experiment  indicating  that 
there  is  a  non-random  patterns  of  outliers  which  could  be  attributed  to  experimental  conditions.  Note  that  the  outlier  rejection 
algorithm  is  only  applied  “locally.”  In  other  words,  it  only  applies  to  the  up  to  five  replicated  readings  in  each  of  the  176 
experimental  conditions. 


5.2  Data  Standardization 


After  outliers  are  removed  from  the  data,  we  compute  the  mean  of  the  control  curves.  The  means  for  Curve  8  and  16 
are  1.1668  and  1.1534  for  the  Low  FA  experiments  and  1.1483  and  1.1477  for  the  High  FA  experiments,  respectively.  To  apply 
the  Emax  model  in  equation  (E  6)  with  E0=  1,  we  standardize  the  data  by  dividing  the  effect  readings  of  Curves  1-7  by  the  mean 
of  Curve  8  and  Curves  9-15  by  the  mean  of  Curve  16,  respectively. 

6.  RESULTS 

6.1  Results  for  the  Low  Folic  Acid  Experiment 

The  Emax  model  in  equation  (E  6)  was  applied  to  fit  all  dose  response  curves.  For  the  Low  FA  experiments,  the 
parameter  estimates,  their  corresponding  standard  errors,  and  the  residual  sum  of  squares  are  given  in  Table  1.  The  dose  response 
relationships  showing  the  data  and  the  fitted  curves  are  displayed  in  Figure  8.  Note  that  although  model  fitting  was  performed  on 
the  original  dose  scale,  dose  is  plotted  on  the  logarithmically  transformed  scale  to  better  show  the  dose  response  relationship.  The 
fitted  marginal  dose  response  curves  for  TMQ  (Curve  C)  and  AG  (Curve  D)  are  shown  in  a  blue  dashed  line  and  a  red  dotted  line, 

respectively.  Table  1  shows  that  ED 50  is  0.00133  for  TMQ  and  0.00621  for  AG,  indicating  that  TMQ  is  about  4.7  times  more 
potent  than  AG  at  the  ED50  level.  For  Curves  E  through  P,  the  fitted  dose  response  curve  for  the  combination  treatment  is  shown 

as  a  solid  black  line  superimposed  on  the  marginal  dose  response  curves.  The  proposed  Emax  model  fits  all  curves  well  except  for 
Curves  G,  H  and  K.  For  Curve  G,  although  the  model  estimates  converge  in  an  initial  attempt,  the  parameter  m  is  estimated  with 
a  standard  error  of  30.3.  The  large  standard  error  essentially  indicates  that  the  estimate  m  is  not  reliable.  For  Curve  K,  the 
model  does  not  converge  on  the  original  dose  scale  but  converges  on  the  logarithmically  transformed  dose  scale.  However,  the 
standard  error  of  the  estimate  m  is  still  very  large,  which  leads  us  to  believe  that  the  model  is  not  very  stable  as  well.  For  Curve 
H,  as  can  been  seen  in  Figure  8,  there  are  no  observed  effects  between  0.3  and  1  from  the  second  to  the  fifth  dilutions.  The 
parameter  m  cannot  be  estimated  and  the  model  fails  to  converge  on  both  the  original  scale  and  the  logarithmic  scale.  To  address 
these  problems,  we  conclude  that  the  data  do  not  provide  us  sufficient  information  to  yield  a  reasonable  estimate  of  the  parameter 
m.  Therefore,  we  take  a  remedial  approach  by  fixing  m,  then,  proceed  to  estimate  the  other  two  parameters.  Upon  checking  the 
data,  we  set  the  parameter  m  as  5,  4.5,  and  5  for  Curves  G,  H,  and  K,  respectively.  The  choice  of  m  is  somewhat  arbitrary  with  the 
goals  to  yield  a  good  fit  to  the  data  and  produce  a  small  residual  sum  of  squares.  The  resulting  “reduced”  models  fit  the  data 
reasonably  well  but  with  a  consequence  that  there  is  no  standard  error  estimate  for  m  ,  which  affects  the  variance  estimation  of 
the  interaction  index  (to  be  shown  later).  Based  on  limited  sensitivity  analysis,  the  estimation  of  the  interaction  index  remains 
reasonably  robust. 

In  all  dose  response  curves,  the  standardized  effect  level  starts  to  drop  between  dose  levels  (dilutions)  3  to  6.  Once  the 
effect  starts  to  drop,  it  drops  quickly  and  plateaus  to  the  1  -  Emax  level.  There  are  ample  data  points  at  the  effect  levels  around  1 

(dose  levels  1-4)  and  1  -  Emax  (dose  levels  8-11).  However,  due  to  the  sharp  drop  in  the  dose  response  curves,  less  data  points 
can  be  found  in  the  middle  of  the  effect  range.  When  the  number  of  data  points  becomes  too  few  or  does  not  spread  out  to  cover 
enough  range,  it  becomes  harder  for  the  model  to  converge,  as  seen  in  Curves  G,  H,  and  K.  The  overall  results  for  the  curve 

fitting  of  the  Low  FA  experiments  are  that  Emax  are  between  0.863  to  0.890;  ED50  are  between  0.00133  to  0.00621;  and  m  are 

between  1.971  to  5.473.  The  residual  sums  of  squares  are  between  0.0599  to  0.1025  and  without  large  values,  suggesting  that  the 
model  fits  the  data  reasonably  well. 

Based  on  the  fitted  dose  response  curve,  interaction  index  (II)  can  be  calculated  over  the  entire  effect  range  and  at 
specific  dose  combinations.  Table  2  gives  a  detailed  result  of  the  estimated  interaction  index  and  its  95%  point-wise  confidence 
interval  at  each  dose  combination  for  each  combination  curve.  The  II  is  calculated  at  the  predicted  effect  level  from  the 
combination  curve  and  not  at  the  observed  effect  level.  The  results  are  shown  in  a  trellis  plot  in  Figure  9  where  red  lines 
represent  the  point-wise  confidence  intervals  at  each  specific  effect  level  and  black  dashed  lines  indicate  the  simultaneous 
confidence  bands  of  II  for  the  entire  range.  From  the  figure  we  find  that  the  interaction  index  can  be  estimated  with  very  good 

precision  in  all  curves  except  at  the  two  extremes  when  the  effect  is  close  to  1  or  1  -  Emax .  The  trend  and  the  pattern  of  the 

interaction  index  are  clearly  shown  in  these  figures.  For  Curves  E  through  K,  i.e.,  with  a  TMQ:AG  dose  ratio  ranging  from  0.004 
to  0.2,  synergy  is  observed  in  the  effect  range  between  0.2  to  0.9.  For  Curves  L  and  M  which  have  TMQ: AG  ratios  of  0.4  and  0.8, 
we  see  that  synergy  is  observed  at  the  low  effect  level  from  0.2  to  about  0.5.  Beyond  0.5  the  combinations  are  generally  additive. 
For  Curves  N,  O,  and  P  with  TMQ: AG  ratios  of  2,  5,  and  10,  the  synergistic  effect  is  lost  and  we  see  additivity  in  all  dose  ranges. 

6.2  Results  for  the  High  Folic  Acid  Experiment 

Similarly,  Table  3  gives  the  parameter  estimates,  the  corresponding  standard  errors,  and  sums  of  squares  for  all  the 
curves  in  the  High  FA  experiment.  Unlike  in  the  Low  FA  cases,  the  model  fitting  for  all  curves  in  the  High  FA  experiment 

converge  using  the  Emax  model.  The  estimated  Emax  ranges  from  0.831  to  0.893;  ED50  ranges  from  0.0137  to  0.1943  except  for 
Curve  D  (AG  alone  with  ED50  =  0.5224);  and  m  ranges  between  1.468  and  3.625.  The  residuals  sums  of  squares  are  between 


0.0615  to  0. 1 134.  Compared  to  the  Low  FA  experiments,  EDs o  are  higher  in  the  High  FA  experiments,  indicating  that  the  drugs 
are  less  potent  with  the  high  FA  medium.  Note  that  doses  for  the  TMQ  are  the  same  between  the  two  experiments  but  the  doses 

for  AG  are  actually  10  times  higher  in  the  high  FA  experiments.  In  addition,  ED 5 0  =0.0137  and  0.00133  for  TMQ  alone  in  the 
high  and  low  FA  experiments,  respectively,  which  indicates  that  the  drug  is  10  times  less  potent  in  the  high  FA  medium 
compared  to  the  low  FA  medium.  The  potency  of  AG  is  even  more  dramatically  reduced.  Figure  10  shows  that  the  Emax  model 
provides  an  excellent  fit  to  all  the  curves.  Table  4  gives  the  detailed  account  of  the  interaction  index  in  all  dilutions  for  all 
combination  curves.  The  results  are  summarized  in  a  trellis  plot  in  Figure  11.  Again,  the  red  lines  represent  the  point-wise 
confidence  intervals  at  each  specific  effect  level  and  black  dashed  lines  correspond  to  the  simultaneous  confidence  bands  of  II  for 
the  whole  range.  With  the  high  FA  medium,  synergy  can  be  achieved  for  most  of  the  drug  combinations  in  all  the  effect  range 
except  at  the  very  low  or  very  high  effects.  The  confidence  intervals  are  still  very  tight  although  they  are  a  little  wider  than  the 
Low  FA  counterparts.  As  the  TMQ:AG  ratio  increases  from  0.0004  to  0.5,  synergy  is  observed  across  all  dilution  series.  In 
addition,  higher  synergy  is  observed  at  the  lower  effect  levels  particularly  when  the  TMQ: AG  is  at  0.01  or  lower  (Curves  E,  F,  G, 
H,  and  I).  In  the  middle  effect  levels  (effect  between  0.2  to  0.8),  II  ranges  from  about  0.1  in  Curves  J  and  K,  to  0.12  in  Curve  L, 
to  0.15  in  Curve  M,  to  0.25  in  Curve  N,  and  to  0.35  in  Curve  O.  The  higher  the  TMQ:AG  ratio  is,  the  less  the  synergy  it  produces. 
In  Curve  P  when  the  TMQ: AG  ratio  reaches  1,  synergy  is  lost. 

7.  SUMMARY 

In  both  the  Low  FA  and  High  FA  experiments,  TMQ  is  more  potent  than  AG.  At  low  TMQ: AG  ratios,  i.e.,  when  a 
small  amount  of  the  more  potent  drug  (TMQ)  is  added  to  a  larger  amount  of  the  less  potent  drug  (AG),  it  results  in  synergy. 
However,  when  the  TMQ: AG  ratio  reaches  to  0.4  or  larger  for  the  low  FA  medium,  or  when  the  TMQ: AG  ratio  reaches  to  1  or 
larger  for  the  high  FA  medium,  synergy  tends  to  become  less,  or  the  interaction  becomes  additive.  In  general,  synergistic  effect 
is  stronger  at  higher  doses  which  produce  stronger  effects  (effect  closer  to  1  -Emax)  than  at  the  lower  dose  levels  which  produce 
weaker  effects  (effect  closer  to  1)  in  the  same  dilution  series. 

The  two  drugs  are  more  potent  in  the  low  FA  medium  compared  to  the  high  FA  medium.  The  drug  synergy,  however, 
is  stronger  in  the  high  FA  medium. 


8.  DISCUSSION 

The  data  supplied  by  Dr.  Greco  provide  an  excellent  opportunity  to  apply  and  compare  various  approaches  for  studying 
combination  drug  effects.  For  the  median  effect  model,  a  linear  relationship  between  the  logit  transformed  effect  and  the  log-dose 
makes  the  model  fitting  straightforward  and  easy.  However,  when  measuring  cell  growth  as  in  the  data  we  received,  if  the 
maximum  drug  effect  reaches  a  plateau  and  does  not  kill  all  the  cells  even  at  the  highest  experimental  doses,  the  median  effect 
model  (12)  does  not  apply.  We  take  the  Emax  model  (19),  which  provides  an  adequate  fit  for  most  data.  For  the  Emax  model, 
parameter  estimation  has  to  be  obtained  via  iterative  procedures,  for  example,  the  non-linear  weighted  least  squares  method 
which  can  address  the  heteroscedascity  problem.  The  model  convergence  is  not  guaranteed  and  whether  the  model  converges  or 
not  depends  on  the  data  and  the  choice  of  the  initial  values.  We  find  that  PROC  NLIN  in  SAS  provides  a  more  comprehensive 
and  robust  environment  for  estimating  parameters  with  nonlinear  regression  compared  to  the  nls()  function  in  S-PLUS/R.  It  can 
be  useful  to  apply  SAS  first  to  estimate  the  parameters,  then,  feed  the  results  into  S-PLUS/R  for  further  data  analysis  and 
graphics.  Unlike  fitting  the  linearly -transformed  median  effect  model  via  linear  regression,  for  which  a  solution  can  always  be 
found,  fitting  the  Emax  model  via  nonlinear  regression,  however,  may  not  converge  in  some  cases.  The  nonconvergence  of  the 
model  may  indicate  pathological  conditions  in  the  data  such  that  the  data  do  not  provide  adequate  information  for  model  fitting. 
We  had  convergence  problems  for  the  Curves  G,  H,  and  K  in  the  Low  FA  experiment.  In  these  cases,  there  is  not  sufficient  data 
in  the  middle  of  the  effect  range;  hence,  the  parameters  cannot  be  estimated  reliably.  We  had  to  fix  the  m  parameter  before  we 
could  estimate  the  other  two  parameters.  From  the  dose  response  curves,  we  find  that  TMQ  is  more  potent  than  AG  and  the  drugs 
are  more  potent  in  the  low  FA  medium  than  in  the  high  FA  medium. 

Upon  the  construction  of  the  marginal  and  combination  dose  response  curves,  we  applied  the  Loewe  additivity  model 
to  compute  the  interaction  index.  Note  that  the  definition  of  drug  interaction,  such  as  the  interaction  index,  is  model  dependent.  In 
addition,  based  on  the  definition  of  the  interaction  index  (7,  8),  the  scale  of  the  dose  level  should  be  in  the  original  physical  scale. 
No  matter  wich  models  one  uses,  the  dose  levels  in  calculating  the  interaction  index  must  be  translated  back  to  the  original  dose 
scale.  Under  the  given  model,  we  find  that  the  drug  interaction  between  TMQ  and  AG  is  largely  synergistic.  Synergy  is  more 
clear  and  evident  in  the  high  FA  experiment  than  in  the  low  FA  experiment.  In  addition,  synergy  is  more  likely  to  be  observed 
when  a  small  dose  of  more  potent  drug  (TMQ)  is  added  to  the  large  dose  of  the  less  potent  drug  (AG).  When  a  large  amount  of 
more  potent  drug  is  present,  adding  the  less  potent  drug  does  not  show  synergy  because  the  effect  is  largely  achieved  by  the  more 
potent  drug  already.  In  addition,  the  interval  estimation  shows  that  the  95%  confidence  intervals  are  wider  at  the  two  extremes  of 
the  effect  which  are  closer  to  1  or  1  -Emax.  The  result  is  consistent  with  many  regression  settings  where  estimation  achieves  higher 
precision  in  the  center  of  the  data  but  lower  precision  at  the  extremes. 


We  have  provided  a  simple,  yet  useful  approach  for  analyzing  the  drug  interaction  for  combination  studies.  The 
interaction  index  for  each  fixed  dose  ratio  is  computed,  then,  displayed  together  using  the  trellis  plot.  The  method  works  well  for 
the  ray  design.  Other  methods  have  been  proposed  to  model  the  entire  response  surface  using  the  parametric  approach  (27)  or  the 
semiparametric  approach  (28).  The  results  from  applying  the  semiparametric  model  are  reported  in  a  companion  article  (29). 
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Figure  Legend: 

Figure  1:  Dose  response  curves  under  the  Emax  model  by  varying  the  parameters  Emax,  ED50 ,  and  m. 

Figure  2:  Variable  width  percentile  plot  for  the  observed  effect  in  experiments  with  low  and  high  folic  acid  media.  Each  vertical 
bar  indicates  a  five  percent  increment.  The  middle  20%  of  the  data  are  shaded  in  a  light  orange  color. 

Figure  3:  Experimental  design  showing  the  logarithmically  transformed  AG2034  (AG)  dose  versus  the  logarithmically 
transformed  trimetrexate  (TMQ)  dose  in  the  fixed  ratio  experiments.  A  total  of  16  curves  are  shown.  Curves  A  and  B  are  the 
controls  with  no  drugs.  Curves  C  and  D  are  single  drug  studies  for  TMQ  and  AG,  respectively.  Curves  E  through  P  are  the 
combination  drug  studies.  Each  curve  has  1 1  dilutions  shown  in  circles.  Panels  A:  low  folic  acid  medium.  Panel  B:  high  folic 
acid  medium. 

Figure  4:  Distribution  of  the  effect  versus  dose  level  for  Curves  A  through  P  for  the  experiment  with  low  folic  acid  medium. 

Figure  5:  Distribution  of  the  effect  versus  dose  level  for  Curves  A  through  P  for  the  experiment  with  high  folic  acid  medium. 

Figure  6:  Perspective  plots  (Panels  A,  B,  and  C),  contour  plots  (Panels  D  and  E),  and  image  plot  (Panel  F)  for  the  effect  versus 
logarithm  transformed  doses  of  trimetrexate  and  AG2034  for  the  experiment  with  low  folic  acid  medium. 

Figure  7:  Perspective  plots  (Panels  A,  B,  and  C),  contour  plots  (Panels  D  and  E),  and  image  plot  (Panel  F)  for  the  effect  versus 
logarithmically  transformed  doses  of  trimetrexate  and  AG2034  for  the  experiment  with  high  folic  acid  medium. 

Figure  8:  Effect  versus  logarithmically  transformed  dose  plot  for  the  combination  study  of  trimetrexate  and  AG2034  with  low 
folic  acid  medium.  Raw  data  are  shown  in  open  circles.  The  blue  dashed  line  and  the  red  dotted  line  indicate  the  fitted  marginal 
dose  response  curves  for  trimetrexate  and  AG2034,  respectively.  The  black  solid  line  indicates  the  fitted  dose  response  curve  for 
the  combination  study  of  trimetrexate  and  AG2034. 

Figure  9:  Trellis  plot  of  the  estimated  interaction  index  (solid  line)  and  its  point- wise  95%  confidence  interval  (red  solid  lines) 
and  the  95%  simultaneous  confidence  band  (dashed  lines)  for  the  low  folic  acid  experiment.  The  estimates  at  the  design  points 
where  experiments  were  conducted  are  shown  in  red.  The  interaction  index  is  plotted  on  the  logarithmically  transformed  scale 
but  labeled  on  the  original  scale. 

Figure  10:  Effect  versus  logarithmically  transformed  dose  plot  for  the  combination  study  of  trimetrexate  and  AG2034  with  high 
folic  acid  medium.  Raw  data  are  shown  in  open  circles.  The  blue  dashed  line  and  the  red  dotted  line  indicate  the  fitted  marginal 
dose  response  curves  for  trimetrexate  and  AG2034,  respectively.  The  black  solid  line  indicates  the  fitted  dose  response  curve  for 
the  combination  study  of  trimetrexate  and  AG2034. 

Figure  11:  Trellis  plot  of  the  estimated  interaction  index  (solid  line)  and  its  point- wise  95%  confidence  interval  (red  solid  lines) 
and  the  95%  simultaneous  confidence  band  (dashed  lines)  for  the  high  folic  acid  experiment.  The  estimates  at  the  design  points 
where  experiments  were  conducted  are  shown  in  red.  The  interaction  index  is  plotted  on  the  logarithmically  transformed  scale 
but  labeled  on  the  original  scale. 
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Abstract 

In  cancer,  proto-oncogenes  are  often  activa  ted  by  genomic  amp  lification.  Here 
we  report  recurrent  focal  amplific  ations  of  chromosome  segment  4q12  over  lapping  the 
oncogenes  PDGFRA  and  KIT'\r\  non-small  cell  lung  cancer  (NSCLC).  Single  nucleotide 
polymorphism  (SNP)  array  and  fluorescent  in  situ  hybridization  (FISH)  analysis  indicate 
that  4q12  is  amplified  in  9%  of  lung  squamous  cell  carcinomas  and  3%  of  lung 
adenocarcinomas.  We  further  dem  onstrate  that  the  lung  squam  ous  cell  carcinoma  cell 
line  NCI-H1703  exhibits  fo  cal  amplification  of  PDGFRA  and  is  dependent  on  PDGF  RA 
activity  for  cell  growth.  Treatment  of  NCI-H1703  cells  with  PDGFRA- specific  shRNAs  or 
with  the  P  DGFRA/KIT  small  molecu  le  inh  ibitors  imatinib  a  nd  s  unitinib  lea  ds  to  cell 
growth  inhibition.  T ogether  these  observations  implicate  PDGFRA  and  KIT  as  potential 
oncogenes  in  NSCLC  and  present  a  novel  opportunity  for  targeted  therapy. 


Introduction 

Lung  cancer  is  the  leading  c  ause  of  cancer  mortality  in  the  United  States  and 
worldwide.  The  majority  of  lung  cancer  cases  are  non-small  cell  lung  cancers  (NSCLC), 
the  most  common  forms  of  whic  h  are  the  tw  o  histological  subty  pes,  adenocarcinoma 
and  squamous  cell  carcinoma.  Despite  advances  in  systemic  therapies  and  surgical 
techniques,  5-year  survival  rates  for  all  types  and  stages  of  lung  cancer  remain  lo  w 
(16%)  (1). 
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Like  other  solid  tumors,  NSCLC  cases  are  subject  to  large  scale  rearrangements 
leading  to  copy  number  gains  and  los  ses  across  the  genome  (2-4).  Systematic 
analyses  of  copy  number  alterations  in  lung  adenocarcinoma  have  identified  genes 
such  as  EGFR,  MYC,  MDM2,  TERT,  NKX2-1,  PIK3CA,  and  MET  to  be  selectively 
amplified  (5-8).  Other  st  udies  focusing  on  oncogenic  point  mutations  hav  e  identified 
recurrent  mutations  leading  to  aberrant  activation  of  EGFR,  KRAS,  PIK3CA,  ERBB2, 
and  BRAF  among  other  genes  (9-12).  Furthermore  ,  inactivating  point  mutations  and 
deletions  in  TP53,  STK11,  NF1,  CDKN2A,  and  PTEN  have  been  reported  (13-17). 
Most  recently,  mutations  in  several  tyrosine  kinase  genes  including  PDGFRA  and  KDR 
have  also  been  repor  ted  (15) .  Unlike  lung  adenocarcinoma,  the  range  of  genetic 
alterations  in  lung  squamous  cell  carcinoma  is  less  understood.  Activating  deletions  in 
the  extracellular  domain  of  EGFR  ( EGFRvlll  mutation)  have  been  identified  in  5%  of 
lung  squamous  cell  carcinoma  samples  exam  ined(18).  In  addit  ion,  chromosome  3q 
amplifications  encompassing  PIK3CA  and  other  genes  have  been  found  in  18%  of  lung 
squamous  cell  carcinoma  samples  (19).  Nonetheless,  despite  thes  e  efforts  to 
characterize  the  NSCLC  genome,  further  work  is  needed  to  identify  the  complete 
spectrum  of  genetic  lesions  involved  in  NSCLC  pathogenesis. 

Importantly,  a  recent  study  using  a  proteomic  rather  than  genomic  approach  t  o 
discover  kinases  activated  in  lung  cancer  identified  phosphorylatio  n  of  the  receptor 
tyrosine  kinase  PDGFRA  in  5%  (8/150)  of  primary  NSCLC  case  s  and  in  the  lung 
squamous  cell  carcinoma  cell  line,  NCI-H  1703  (20).  Treatment  of  NCI-H1703  with 
imatinib,  an  FDA-approved  PDG  FRA  and  KIT  inhibitor,  resulted  in  apoptotic  cell  death. 
Aberrant  PDGFRA  activation  has  been  shown  to  play  a  tumorigenic  role  i  n 


4 


gastrointestinal  stromal  tumors  (GIST)  a  nd  several  brain  tumor  types  (21 , 22). 
Constitutively  activating  point  mutations  in  PDGFRA  are  found  in  5%  of  GIST  cases. 
Additionally,  PDGFRA  is  amplified  in  glioblastoma  multiforme  and  other  malignant  brain 
tumors.  However,  in  NSCLC  the  genetic  basis  for  PDGFRA  dependency  is  unclear. 

We  have  therefore  investigat  ed  the  role  of  PDGFRA  in  NSCLC  etiology  us  ing  a 
combination  of  copy  number  an  alyses  in  primary  samples  and  in  vitro  experiments  in 
cell  line  m  odels.  Here  ,  we  demonstrate  that  PDGFRA,  as  well  as  the  neighb  oring 
kinase  KIT,  are  recurrently  amplified  in  NSCLC  at  a  frequency  of  9%  in  squamous  cell 
carcinoma  and  3%  in  adenocarcinomas.  The  role  of  KIT  in  NSCLC  turn  origenesis 
remains  unclear  due  to  the  absence  of  cell  line  models  that  harbor  endog  enous  KIT 
amplification  in  which  to  per  form  functional  t  esting.  Howe  ver,  using  the  cell  line  NCI- 
H1703,  we  highlight  the  role  of  PDGFRA  as  a  novel  prospective  oncogene  in  NSCLC 
and  its  potential  as  a  target  for  novel  therapeutic  modalities  in  lung  cancer. 


Materials  and  Methods 

NSCLC  primary  samples  and  cell  lines 

Genomic  DNA  was  extract  ed  from  74  fresh  frozen  pr  imary  tumors  and  84  cell 
lines.  Primary  samples  were  collected  from  six  different  sites:  Memorial-Sloan  Kettering 
Cancer  Center  (2  tumors),  Un  iversity  of  Michigan  (1  tumor),  Washington  University  in 
St.  Louis  (  3  tumors),  Dana-Far  ber  Cancer  In  stitute/The  Broad  Institute  (8  tumors), 
Brigham  and  Women’s  Hospital  tissue  bank  (25  tumors),  and  from  the  University  Health 
Network  in  Toronto  (35  tumors).  Cell  li  nes  were  obtained  from  ATCC  (26  cell  lines)  , 
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DSMZ  (5  cell  lines),  Dana-Far ber  Cancer  Institute/The  Broad  Institute  (1  cell  line),  NCI 
Developmental  Therapeutics  Program  (3  cell  lines),  and  from  J.D.  Minna  at  University  of 
Texas  Southwestern  Medical  Center  (49  cell  lines).  A  dditionally,  raw  Affymetrix  250K 
SNP  array  data  from  554  primary  adenocar  cinomas  as  published  in  Weir  et  al.  (5)  and 
22  NSCLC  cell  lines  from  the  NCI  caArray  open-source  database  were  utilized 

SNP  array  experiments  and  analysis 

SNP  array  experiments  were  perfo  rmed  on  734  NSCLC  turn  or  and  cell  line 
samples  as  described  in  Weir  et  al  (5).  Data  was  analyzed  using  GISTIC  as  described 
in  Beroukhim  et  al  (23).  Briefly,  genomic  DNA  was  genotyped  using  the  Sty  I  chip  of  the 
500K  Hum  an  Mapping  Array  set  (Affymetrix  In  c)  at  the  Broad  I  nstitute.  Raw  probe 
intensities  were  processed  using  the  GeneP  attern  software  package;  copy  number  was 
computed  by  dividing  the  intensity  of  each  probeset  by  the  mean  value  of  that  probeset 
in  the  five  closest  normals  by  Euclidean  di  stance  (see  Beroukhim  et  al).  Normalized 
data  was  segmented  with  GenePattern  modul  es  bas  ed  on  the  GLAD  algo  rithm.  G- 
scores  derived  from  GISTIC  were  obtained  f  or  each  SNP  probe  ac  ross  chromosome  4. 
Only  amplifications  exceeding  log2  ratio  of  0.3  were  included.  G-scores  were  compared 
against  a  null  model  generated  by  random  permu  tations  to  determine  a  fals  e  discovery 
rate  (q-val  ue).  Peaks  with  q-values  below  0.  25  were  considered  signific  ant.  Both  the 
peak  region  determined  by  minimal  common  overlap  (chr4:54781 155-54868471,  probes 
57234:57238)  and  the  wide  peak  determined  by  a  leave-one-out  approach 

(chr4:547581 16-55357275,  probes  57231:57300)  are  reported. 
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Tissue  microarray  Fluorescent  in-situ  hybridization  (TMA-FISH) 

To  assess  for  PDGFRA  amplification,  a  probe  spanning  PDGFRA  (chr4q12,  99.3 
kb)  and  a  reference  probe  spanning  a  stable  r  egion  identified  by  SNP  data  in  NSCLC 
(chr4q22.3-q23,  193  kb)  were  used.  For  the  PDGFRA  target  probe,  the  Biotin-1 4-dCTP 
labeled  BAC  clone  CTD-2054G  1 1  (conjugated  to  produce  a  red  signal)  was  applied. 
For  the  reference  probe,  the  Digoxigenin  -1 1-dUTP  labeled  BA  C  clone  RP1 1-799A1  2 
(conjugated  to  produce  a  green  signal)  was  applied.  Correct  chromosomal  probe 
localization  was  confirmed  on  normal  lymphocyte  metaphase  preparations.  BAC  clones 
were  obtained  from  the  BACPAC  Resource  Center,  Children’s  Hospital  Oakland 
Research  Institute  (CHORI)  (Oakland,  CA)  and  I  nvitrogen  (C  arlsbad,  CA).  Tiss  ue 
hybridization,  washing,  and  color  detection  were  perf  ormed  as  described  previously 
(39).  PDGFRA  amplification  by  FISH  was  asse  ssed  in  171  sam  pies  (represented  b  y 
497  tissue  microarray  cores).  At  least  one  TMA  core  could  be  evaluated  per  case.  The 
samples  were  analyzed  under  a  60x  oil  imme  rsion  objective  using  an  Olym  pus  BX-51 1 
fluorescence  microscope,  a  CCD  (charge-coupl  ed  device)  camera  and  the  CytoVision 
FISH  imaging  and  c  apturing  s  oftware  (A  pplied  Imaging,  San  Jose,  CA).  Semi- 
quantitative  evaluation  of  the  tests  was  independently  performed  by  two  evaluators 
(A.R.,  S.P.).  For  each  case,  we  attempted  to  analyze  at  least  100  nuclei. 

Cell  culture  and  reagents 

The  NSCLC  cell  lines,  NCI-H1703  and  HCC15  were  obtained  from  ATCC 
(Manassas,  Virginia,  United  States)  and  DSMZ  (Braunschweig,  Germany),  respectively. 
Cells  were  maintained  in  RPMI  1640  comp  lete  media  supp  lemented  with  10%  calf 
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serum  (Gibco/lnvitrogen,  Carlsbad,  California,  United  States)  and  penicillin/streptomycin 
(Gibco/lnvitrogen).  Unless  otherwise  noted,  cells  were  placed  in  media  containing  0.5% 
calf  serum  24  h  prior  to  lOOng/mL  PDGF  (#9909,  Cell  Signaling  Technologies,  Danvers, 
MA,  United  States)  stimul  ation  for  20  minut  es  at  37  °C.  Imatinib  and  Sun  itinib  were 
purchased  from  LC  Laboratories  (Woburn,  Massachusetts,  United  States)  and  diluted  in 
DMSO  to  the  indicated  concentrations. 

shRNA  mediated  PDGFRA  knockdown 

shRNA  vectors  targeted  against  PDGFRA  and  GFP  were  obtained  from  TRC 
(The  RNAi  Consortium).  The  target  sequences  of  the  PDGFRA  shRNA  constructs  are: 
PDGFRA#1  (TRCN0000001422):  5'-  CCCAACTTT CTT AT CCAACTT -3’ . 

PDGFRA  #2  (TRCN0000001423):  5'-  CCAGCCTCATATAAGAAGAAA-3'. 

PDGFRA  #3  (TRCN0000001424):  5'-  CCAGCTTT CATT ACCCT CT AT-3'. 

PDGFRA  #4  (TRCN0000001425):  5'-  CG  GT  G  AAAG  ACAGT  G  GAG  ATT -3' . 

PDGFRA  #5  (TRCN0000001426):  5'-  CAAT G G ACTT ACCCT G G AG AA-3' . 

The  sequence  targeted  by  the  G  FP  shRNA  is  5’-GCA  AGCTGACCCTGAAGTTCAT-3’. 
Lentiviruses  were  made  by  transfection  of  293T  packaging  cells  with  these  constructs 
using  a  three  plasmid  system  as  previously  described  (41).  Target  cells  were  incubated 
with  lentiviruses  for  6  hours  in  the  presence  of  8  p,g/ml  polybrene  and  left  in  fresh  media. 
Two  days  after  infection,  puromycin  (2  |ig/ml  for  NCI-H1703  and  HCC15)  was  add  ed. 
Cells  were  grown  in  the  presence  of  puromycin  for  four  days.  Fifty  micrograms  of  total 
cell  lysates  prepared  from  the  puroselected  cell  lines  was  ana  lyzed  by  Western  blotting 
using  anti-PDGFRA  monoclonal  antibody  (#  sc-31166,  Santa  Cruz  Biotechnology),  anti- 
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phospho  PDGFRA  m  onoclonal  antibody  (#  sc-  12910,  Santa  Cruz  Biotechnology)  a  nd 
anti-Actin  monoclonal  antibody  (#  sc-1615,  Santa  Cruz  Biotechnology). 

Cell  survival  assays  with  tumor  cell  lines  expressing  shPDGFRA  and  shGFP 
constructs. 

800  cells  for  each  tumor  cell  line  expressing  shRNAs  targeting  PDGFRA  or  GFP 
along  with  uninfected  cells  were  seeded  in  6  wells  on  a  96  well  plate.  Cell  viability  was 
determined  at  24  hour  time  points  for  4  consecutive  days  using  the  WST-1  assay 
(Roche  Applied  Science).  The  percentage  of  cell  viability  is  plotted  for  each  cell  line  of 
readings  obtained  on  Day  3  relative  to  Day  1 . 

Soft  agar  anchorage-independent  growth  assay  with  tumor  cell  lines  expressing 
shPDGFRA  and  shGFP  constructs. 

NCI-H1703  and  HCC15  cells  expressi  ng  shPDGFRA  and  shGFP  were 
suspended  in  a  top  layer  of  RPMI1640  containing  10%  calf  serum  and  0.4%  Select  agar 
(Gibco/lnvitrogen,  Carlsbad,  California,  United  States)  and  plated  on  a  bottom  layer  of 
RPMI1640  containing  10%  calf  serum  and  0.5%  Select  agar.  Imatinib  or  Sunitinib  were 
added  as  described  to  the  top  agar.  After  3  weeks  incubation  f  or  HCC15  cells  and  5 
weeks  for  NCI-H1703  cells,  c  olonies  were  counted  in  triplicate.  IC50s  were  determined 
by  nonlinear  regression  using  the  Prism  Graphpad  software. 
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Cytotoxicity  assays 

Lung  cancer  cell  lines  were  treated  with  Imatinib  or  Sunitinib  one  day  after  plating 
and  cell  survival  was  assess  ed  4  day  s  later  using  the  WST-1  assay  (Roch  e, 
http://www.roche.com).  Each  data  point  represents  the  median  of  six  re  plicate  wells  for 
each  tumor  cell  line  and  inhibit  or  concentrati  on.  IC50s  were  determined  by  nonline  ar 
regression  using  the  Prism  Graphpad  software. 

Immunoblotting 

Cells  were  lysed  in  a  buffer  containi  ng  50  mM  Tris-HCI  (p  H  7.4),  150  mM  NaCI, 
2.5  mM  E  DTA,  1%  Triton  X-1 00,  and  0.25%  IPEGAL.  Protease  inhibitors  (Roche  , 
http://www.roche.com)  and  phos  phatase  in  hibitors  (Calbiochem,  La  Jolla,  CA,  Unite  d 
States)  were  added  prior  to  use.  Samples  were  nor  malized  for  total  protein  content. 
Lysates  were  boiled  in  samp  le  buffer,  separated  by  SDS-  PAGE  on  8%  polyacrylamide 
gels,  transferred  to  P  VDF  membrane,  and  probed  as  described.  Antibodies  used  for 
immunoblotting  were:  anti-PDGFRA  monoclona  I  antibody  (#s  c-31 166,  Santa  Cru  z 
Biotechnology),  anti-  phospho  PDGFRA  m  onoclonal  antibody  (#  sc-12910,  Santa  Cruz 
Biotechnology)  and  anti-Ac  tin  monoclonal  antibody  (#  sc-1615,  Santa  Cruz 
Biotechnology). 
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Results 

PDGFRA  and  KIT  are  amplified  in  non-small  cell  lung  cancer 

To  determine  if  PDGFRA  is  recurrently  amplif  ied  in  NSCLC,  734  NSCLC 
samples  (628  primary  samples,  106  cell  lines)  were  evaluated  for  copy  number 
aberrations  with  Affymetrix  250K  SNP  arrays  (Fig.  1/4,  Supplementary  Table  SI).  Using 
the  GISTIC  (Genomic  Identificati  on  of  Significant  Targets  in  Cancer)  algorithm  (23),  a 
600  Kb  region  on  4q12  (54.76  to  55.36  Mb)  was  found  to  be  significantly  amplified.  The 
sole  genes  within  this  region  are  PDGFRA  and  the  close  ly  related  receptor  tyrosine 
kinase  KIT. 

Visual  inspection  reve  aled  amplifications  at4q12  overlapping  the  PDGFRA/KIT 
locus  in  31  (4.2%)  NSCLC  samples  (Figure  1  B,  Supplementary  Table  S2).  The  majority 
(93%;  29/31 )  of  these  amplificat  ions  were  relatively  focal  events  (<50%  of  the  length  o  f 
chromosome  4q)  suggesting  that  selectiv  e  amp  lification  of  target  genes  is  occurring. 
Comparing  NSCLC  subtypes,  8.7%  (5/57)  of  squamous  cell  c  arcinomas  and  3.5% 
(21/588)  of  adenocarc inomas  showed  amplifications,  suggesting  that 4q  12  is  amplified 
at  appreciable  frequencies  acr  oss  both  majo  r  NSCLC  subtypes.  The  inferred  copy 
number  and  length  of  focal  amplifications  r  anged  from  2.47  to  10.24  copies  (median  = 
2.8  copies)  and  from  0.45  to  48.4  Mb  (median  =  7.55  Mb),  respec  tively.  Here,  non¬ 
integer  copy  number  values  are  t  he  result  of  smoothening  across  multiple  SNP  probes 
using  the  GLAD  (Gain  and  Los  s  Analysis  of  DNA)  segmentation  algorithm  (24).  The 
only  previously  described  oncogenes  in  these  focally  amplified  regions  are  PDGFRA 
and  KIT.  Interestingly,  our  group  has  f  ound  recur  rent  point  mutations  in  KDR,  a 
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receptor  tyrosine  kinase  located  adjacent  to  KIT,  as  well  as  PDGFRA  and  rarely  in  KIT 
(15).  In  our  data,  KDR  is  often  amplified  with  PDGFRA  and  KIT  (28/31  samples)  but  it 
does  not  fall  within  the  GISTIC  region  of  statistical  significance. 

Most  of  the  samples  (26/29)  with  focal  amplification  at  4q12  had  amplic  ons 
spanning  both  PDGRA  and  KIT.  However,  three  samples  had  amplicon  breakpoints 
between  PDGFRA  and  KIT  and  only  amplify  one  of  the  two  genes,  suggesting  that  only 
one  of  these  genes  is  necessary  for  NSCLC  tumorigenesis.  Two  primary 
adenocarcinoma  samples,  S  M-1 1SU  and  SM-1 1 U9,  are  PDGFRAm ^/k/7amplified  while 
the  lung  squamous  cell  care  inoma  cell  line  NCI-H1703  is  PDGFRA^'^/Klt"7 
(Supplementary  Fig.  SI).  Importantly,  t  he  lung  squamous  cell  carcinoma  cell  line  NCI- 
H1703  has  recently  been  shown  to  over  express  phosphorylated  PDGFRA  p  rotein  and 
to  exhibit  activated  MAP  kinas  e  pathway  signaling  (20).  To  identify  an  y  possible 
activating  point  mutations  or  insertion/deletions  in  NCI  -HI 703,  the  coding  region  of 
PDGFRA  cDNA  was  sequenced  but  no  somatic  al  terations  were  detected  (data  not 
shown). 

To  valid  ate  our  initial  findi  ngs,  we  performed  fluorescence  in  situ  hybridization 
(FISH)  in  an  independent  set  of  171  primar  y  NSCLC  tumor  samples  us  ing  a  bacterial 
artificial  chromosome  (BAC)  probe  overlapping  PDGFRA  (Fig.  1  C).  A  BAC  prob  e 
overlapping  a  non-amplified  genomic  region  (4q21),  as  indicated  by  SNP  array  analysis, 
was  used  as  a  control.  Of  the  171  NSCLC  samples  eval  uated,  1 1  (6%)  were  found  to 
have  amplification  of  the  PDGFRA  locus.  Within  subtypes,  PDGFRA  amplification  was 
observed  in  3.0%  (2/66)  of  adenocarcino  ma  and  9.3%  (9/96)  of  squamous  cell 
carcinoma  samples,  closely  mirroring  t  he  results  observed  by  SNP  array  analysis 
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(Supplementary  Table  S3).  Interestingly,  two  different  amplific  ation  phenotypes  were 
observed.  Seven  samples  exhibited  hi  gh-level  amplification  (CN  >10)  of  PDGFRA  \r\ 
~5%  of  tumor  cells  within  the  sample  .  Conversely  ,  four  samples  with  PDGFR  ga  in 
showed  lower  levels  of  amplif  ication  (CN  4-8)  present  in  the  majority  (>50%)  of  a 
sample’s  tumor  cells. 


PDGFRA  is  essential  for  tumor  cell  survival 

Unregulated  express  ion  of  oncogenes  ha  s  been  shown  to  be  necessary  for 
tumor  cell  proliferation  or  viability.  Given  t  his  dependency  upon  continued  oncogenic 
signaling,  termed  oncogene  addiction,  tumor  cells  with  genetic  ally  altered  oncogenes 
can  often  be  effectively  treated  w  ith  targeted  agents  (25).  Six  cell  lines  harbor  ing  4q12 
copy  number  gain,  whether  broad  or  focal,  were  tested  for  PDGFRA  express  ion 
(Supplementary  Fig.  S2).  Of  these,  NCI-H1 703  cells  high  ly  expressed  PDGFRA  and 
were  studied  further.  In  order  to  demons  trate  a  role  for  PDGFRA  in  the  survival  o  f 
NSCLC  with  PDGFRA  amplification,  we  tested  a  seri  es  of  shRNA  constru  cts  in  NCI- 
H1703  cells  and  control  HCC15  cells  without  4q12  amplification.  As  shown  in  Fig.  2  A, 
three  out  of  five  short  hairpin  RNAs  were  found  to  significantly  knock  down  the 
PDGFRA  expression  in  NCI-H1703  ce  lls.  This  k  nock  down  of  PDGFRA  inhibited  ce  II 
survival  and  anchorage-independent  growth  in  NCI-H1703  cells,  but  not  in  the  HCC15 
cells  (Fig.  2  B  and  Supplementar  y  Fig.  S3  A).  Together,  these  ob  servations  implicate 
PDGFRA  as  an  essential  gene  in  a  subset  of  NS  CLC  samples;  similar  to  other  known 
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oncogenes,  such  as  EGFR  and  FGFR2,  that  can  render  onco  gene-expressing  c  ells 
dependent  upon  their  activation  (26,  27). 


Effect  of  PDGFRA  kinase  inhibitors  on  PDGFRA  over  expressing  cells 

We  then  investigated  whether  inhibition  of  PDGFRA  kinase  activity  with  s  mall 
molecule  inhibitors  could  be  effective  against  NCI-H1 703  c  ells.  To  this  end,  we 
examined  the  effect  of  tw  o  small  molecule  tyrosine  kinase  inhibitors,  imatinib  and 
sunitinib,  that  are  approved  for  the  treatm  ent  of  leuk  emia,  GIST,  and  adv  anced  renal 
cell  carcin  oma  (28-30).  While  imatinib  sp  ecifically  inhibits  tyrosine  kinas  e  activity  of 
ABL,  KIT,  and  PDGFRA;  sunitinib  is  a  multi-targeted  tyrosine  kinase  inhibitor  of  VEGFR- 
family  receptors,  PDGFR-family  receptors,  KIT,  RET,  FLT3,  and  CSF-1R  (31,  32). 
Treatment  with  2|iM  imatinib  or  sunitinib  for  40  minutes  at  37°C  inhibited  the  constitutive 
phosphorylation  of  PDGFRA  in  NCI-H1703  cells  har  boring  am  plification  of  PDGFRA 
(Fig  3 A)  suggesting  that  both  inhibitors  inhibited  PDGFRA  kinase  activity.  Consistent 
with  this  in  vitro  effect,  treatment  with  imatinib  or  sunitinib  also  result  ed  in  mar  ked 
decrease  of  cell  survival  in  culture  as  determined  by  WS  T-based  cell  proliferation 
assays,  with  an  IC50  of  20nM  and  74nM,  respectively  (Fig  3  B  &  3  C).  Furthermore, 
imatinib  inhibited  anchorage  independent  colony  formation  in  soft  agar  at  an  IC50  of  80 
nM  while  sunitinib  inhibited  colony  formation  at  an  IC50  of  200  nM.  Notably,  treatment 
of  HCC15  cells  with  imatinib  or  sunitinib  had  little  to  no  effect  on  cell  survival  or  colony 
formation  ability  (Supplementary  Fig.  S4A  &  S4B). 
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Discussion 

The  advent  of  targeted  cancer  therapeut  ics  emphasizes  the  signific  ance  of 
identifying  genetic  alt  erations  that  lead  to  oncogene  dependency  in  tumor  s.  Notably  , 
discovery  of  tumorigenic  NSCL  C  somatic  alterations  in  pr  oto-oncogenes  with  existing 
targeted  therapeutic  approaches  in  other  cancer  types  could  provide  opportunities  for 
immediate  adoption  of  clinically  approved  treatments. 

We  have  identified  recurrent  focal  amplifications  of  4q12  NSCLC.  T  o  our 
knowledge,  this  is  the  first  description  of  4q12  amplification  in  NS  CLC.  Systematic 
statistical  analys  is  s  uggests  that  the  oncogenes  PDGFRA  and  KIT  are  the  target  of 
these  copy  number  gains.  KDR,  a  VEGFR-family  recep  tor  involve  d  in  tumo  r 
angiogenesis,  is  adjacent  to  KIT  and  is  also  often  amplified.  Even  though  it  does  not  lie 
within  the  genomic  region  of  statistical  s  ignificance,  it  is  poss  ible  that  KDR  is  also  a 
target  of  4q  12  gain.  PDGFRA,  KIT,  and  KDR  are  often  amplifi  ed  together  in  brai  n 

tumors  (33,  34).  In  GISTs,  PDGFRA  and  KIT  are  activated  by  point  mutations  (21, 35) 
and  therapies  targeting  these  kinases  have  proved  highly  efficacious.  Point  mutations 
have  also  been  found  in  PDGFRA,  KIT,  and  KDR  in  lung  adenocarcinomas  however  the 
oncogenecity  of  these  mutations  are  not  known  (15). 

We  further  demonstrate  that  the  I  ung  squamous  carcinoma  cell  line  NCI-H1703 
focally  amplifies  PDGFRA  and  is  dependent  on  PDGFRA  signaling  for  cell  growth.  This 
observation  indicates  that  focal  amplificat  ions  at  4q12  can  lead  t  o  aberrant  PDGFRA 
activation  and  subsequent  oncogenic  signaling.  The  role  of  KIT  in  samples  with  4q  12 
amplification  remains  unclear  due  to  a  lack  of  appropriate  ce  II  line  models.  Importantly, 
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amplification  of  PDGFRA  and  KIT  presents  a  potential  opportuni  ty  for  targeted  therapy 
in  a  subs  et  of  NSCLC  patient  s.  Treatment  of  NCI-H1703  cells  with  PDGFRA-specific 
shRNAs  and  the  small  molecule  inhibitors  imat  inib  or  sunitinib  leads  to  c  ell  d  eath. 
Interestingly,  a  phase  II  clinical  t  rial  of  s  unitinib  in  previously  treated  NSCLC  patients 
demonstrated  an  objective  response  rate  of  1 1  %  (7/63  patients)  (36).  It  remains  to  be 
seen  whether  NSCLC  patients  responsive  to  sunitinib  harbor  4q12  amplification. 

The  use  of  a  single  t  herapeutic  agent  to  ta  rget  multiple  kinas  es  within  a  tumor 
has  been  previously  s  uggested  (37,  38).  It  is  not  known  whether  4q12  amplification  in 
NSCLC  samples  leads  to  simultaneous  activation  of  both  PDGFRA  and  KIT  or  only  one 
of  the  two.  Nonetheless,  both  kinases  are  known  targets  of  imatinib  and  sunitinib  and  it 
could  be  presumed  that  both  kinases  c  ould  be  inhibit  ed  with  one  small  molecu  le 
inhibitor.  Thus  it  is  possible  that  4q12  amplific  ation  in  NSCLC  combined  with 
immunohistochemistry  for  PDGFRA  and/or  Kl  T  could  be  sufficient  as  a  marker  for 
sensitivity  to  imatinib  and  s  unitinib  regardless  of  which  one  ogenes  are  activated.  This 
data  argues  that  further  test  ing  of  these  agents  in  NSCL  C  patients  with  4q12  gain  may 
lead  to  important  advances  in  the  care  of  these  patients. 


Acknowledgements 

We  thank  Shantanu  Banerji  for  critical  readi  ng  of  the  manuscript.  A.D.  is  supported  by 
the  Swiss  National  Science  Foundation  Fell  owship  #PBZHB-1 06297.  This  work  was 
supported  by  National  Cancer  Instit  ute  grants  5R01CA1 09038  and  5P20CA90578 
(M.M.). 


16 


References 

1 .  Jemal  A,  Siegel  R,  Ward  E  ,  et  al.  Cancer  statistics,  2008.  CA  Cancer  J  Clin 
2008;58(2):71-96. 

2.  Weir  B,  Zhao  X,  Mey  erson  M.  Somatic  alterations  in  the  human  cancer  genome. 
Cancer  Cell  2004;6(5):433-8. 

3.  Tonon  G,  Brennan  C,  Protopopov  A  ,  et  al.  Common  and  contrasting  genomic 
profiles  among  the  major  human  lung  canc  er  subtypes.  Cold  Spring  Harb  Symp  Quant 
Biol  2005;70:11-24. 

4.  Lockwood  WW,  Chari  R,  Coe  BP  ,  et  al.  DNA  am  plification  is  a  ubiqu  itous 

mechanism  of  oncogene  activation  in  lung  and  other  cancers.  Oncogene 

2008;27(33):461 5-24. 

5.  Weir  BA,  Woo  MS,  Getz  G  ,  et  al.  Characterizing  the  c  ancer  genome  in  lung 
adenocarcinoma.  Nature  2007;450(7171):893-8. 

6.  Kwei  KA,  Kim  YH,  Girard  L,  et  al.  Genomic  profiling  identifies  TITF1  as  a  lineage- 
specific  oncogene  amplified  in  lung  cancer.  Oncogene  2008;27(25):3635-40. 

7.  Yamamoto  H,  Shige  matsu  H,  Nomura  M  ,  et  al.  PIK3CA  mut  ations  and  copy 
number  gains  in  human  lung  cancers.  Cancer  Res  2008;68(1 7):691 3-21 . 

8.  Engelman  JA,  Zej  nullahu  K,  Mitsudomi  T  ,  et  al.  MET  amplific  ation  lea  ds  to 
gefitinib  r  esistance  in  lung  cancer  by  activating  ERBB3  signaling.  Scienc  e 
2007;316(5827):  1039-43. 

9.  Davies  H,  Hunter  C,  Smith  R,  et  al.  Somatic  mutations  of  the  protein  kinase  gene 
family  in  human  lung  cancer.  Cancer  Res  2005;65(17):7591-5. 


17 


10.  Samuels  Y,  Wang  Z,  Bardelli  A,  et  al.  High  frequency  of  mutations  of  the  PIK3CA 
gene  in  human  cancers.  Science  2004;304(5670):554. 

1 1 .  Stephens  P,  Hunter  C,  Bignell  G  ,  et  al.  Lung  canc  er:  intragenic  ERBB2  k  inase 
mutations  in  tumours.  Nature  2004;431(7008):525-6. 

12.  Naoki  K,  Chen  T  H,  Richards  WG,  Sugarbaker  DJ,  Meyerson  M.  Missense 

mutations  of  the  BRAF  gene  in  hum  an  lung  adenocar  cinoma.  Cancer  Res 

2002;62(23):7001  -3. 

13.  Takahashi  T,  Nau  MM,  Chiba  I  ,  et  al.  p53:  a  frequent  target  for  genetic 
abnormalities  in  lung  cancer.  Science  1989;246(4929):491-4. 

14.  Sanchez-Cespedes  M ,  Parrella  P,  Esteller  M  ,  et  al.  Inactivation  of  LKB1/STK1 1 
is  a  common  event  in  adenocarcinomas  of  the  lung.  Cancer  Res  2002;62(13):3659-62. 

15.  Ding  L,  Getz  G,  Wheeler  DA  ,  et  al.  Somatic  mutations  affect  ke  y  pathways  in 
lung  adenocarcinoma.  Nature  2008;455(721 6):  1069-75. 

16.  Packenham  JP,  Taylor  JA,  White  CM,  Anna  CH,  Barrett  JC,  Devereux  T  R. 
Homozygous  deletions  at  chromosome  9p21  and  mutation  analysis  of  pi 6  and  p  15  in 
microdissected  primary  non-small  cell  lung  cancers.  Clin  Cancer  Res  1995;1(7):687-90. 

17.  Forgacs  E,  Biester  veld  EJ,  Sekido  Y  ,  et  al.  Mutation  analys  is  of  the 
PTEN/MMAC1  gene  in  lung  cancer.  Oncogene  1998;17(12):1557-65. 

18.  Ji  H,  Zhao  X,  Yuza  Y,  et  al.  Epidermal  growth  factor  receptor  variant  III  mutations 
in  lung  tumorigenesis  and  sensitivity  to  tyrosine  kinase  inhibitors.  Proc  Natl  Acad  Sc  i  U 
S  A2006;103(20):7817-22. 

19.  Okudela  K,  Suzuki  M,  Kageyama  S,  et  al.  PIK3CA  mutation  and  amplification  in 
human  lung  cancer.  Pathol  Int  2007;57(10):664-71. 


18 


20.  Rikova  K,  Guo  A,  Zeng  Q  ,  et  al.  Global  s  urvey  of  phosphotyrosine  signaling 
identifies  oncogenic  kinases  in  lung  cancer.  Cell  2007;131(6):1 190-203. 

21.  Hirota  S,  Ohashi  A,  Nishida  T  ,  et  al.  Gain-of-function  mutations  of  pla  telet- 
derived  growth  factor  recept  or  alpha  gene  in  gastroin  testinal  stromal  tumors 
Gastroenterology  2003;125(3):660-7. 

22.  Fleming  TP,  Saxena  A,  Clark  WC  ,  et  al.  Amplification  and/or  overexpress  ion  of 
platelet-derived  growth  factor  receptors  and  epidermal  growth  factor  receptor  in  human 
glial  tumors.  Cancer  Res  1992;52(16):4550-3. 

23.  Beroukhim  R,  Getz  G,  Nghiemphu  L  ,  et  al.  Assessing  the  significance  of 
chromosomal  aberrations  in  cancer:  methodolog  y  and  application  to  glioma.  Proc  Natl 
Acad  Sci  U  S  A  2007;104(50):20007-12. 

24.  Hupe  P,  Stransky  N,  Thiery  JP,  Radvany  i  F,  Barillot  E.  Analysis  of  array  CGH 

data:  from  signal  ratio  to  gain  and  loss  of  DNA  regions.  Bioinformatic  s 

2004;20(1 8):341 3-22. 

25.  Sharma  SV,  Settleman  J.  Oncogene  addi  ction:  setting  the  stage  for  molecularly 
targeted  cancer  therapy.  Genes  Dev  2007;21(24):3214-31. 

26.  Greulich  H,  Chen  TH,  Feng  W  ,  et  al.  Oncogenic  transformation  by  inhibitor- 
sensitive  and  -resistant  EGFR  mutants.  PLoS  Med  2005;2(11):e313. 

27.  Dutt  A,  Salvesen  HB,  Chen  TH  ,  et  al.  Drug-sensitive  FGF  R2  mutations  in 
endometrial  carcinoma.  Proc  Natl  Acad  Sci  U  S  A  2008;  105(25):871 3-7. 

28.  Wardelmann  E,  Merkelbach-Bruse  S,  Pauls  K  ,  et  al.  Polyclonal  evolution  of 
multiple  secondary  Kl  T  mutati  ons  in  gastrointestinal  s  tromal  tumors  under  treatment 
with  imatinib  mesylate.  Clin  Cancer  Res  2006;12(6):1 743-9. 


19 


29.  van  Oosterom  AT,  Judson  I,  Verweij  J  ,  et  al.  Safety  and  efficacy  of  imati  nib 
(STI571)  in  metastatic  gastrointestinal  stromal  tumours:  a  phase  I  study.  Lancet 
2001;358(9291):  1421 -3. 

30.  Motzer  RJ,  Hutson  T  E,  Tomczak  P  ,  et  al.  Sunitinib  versus  interferon  alfa  in 
metastatic  renal-cell  carcinoma.  N  Engl  J  Med  2007;356(2):1 15-24. 

31 .  Christensen  JG.  A  preclinical  review  of  sunitinib,  a  multitargeted  receptor  tyrosine 
kinase  inhibitor  with  anti-angiogenic  and  antitumour  activities.  Ann  Oncol  2007;18  Suppl 
10:x3-10. 

32.  Karaman  MW,  Herrgard  S,  Treiber  DK  ,  et  al.  A  quantitative  an  alysis  of  kinase 
inhibitor  selectivity.  Nat  Biotechnol  2008;26(1):127-32. 

33.  Joensuu  H,  Puputti  M,  Sihto  H,  Ty  nninen  O,  Nupponen  NN.  Amplification  of 
genes  encoding  KIT,  PDGFRalp  ha  and  VEGFR2  receptor  tyrosine  kinas  es  is  frequent 
in  glioblastoma  multiforme.  J  Pathol  2005;207(2):224-31. 

34.  Puputti  M,  Tynninen  O,  Sihto  H  ,  et  al.  Amplification  of  KIT,  PDGFRA,  VEG  FR2, 
and  EGFR  in  gliomas.  Mol  Cancer  Res  2006;4(12):927-34. 

35.  Singer  S,  Rubin  BP,  Lux  ML,  et  al.  Prognostic  value  of  KIT  mutation  type,  mitotic 
activity,  and  histologic  subtype  in  gastr  ointestinal  stromal  tumors.  J  Clin  Oncol 
2002;20(1 8):3898-905. 

36.  Socinski  MA,  Novello  S,  Brahmer  JR  ,  et  al.  Multicenter,  phase  II  trial  of  sunitinib 
in  previously  treated,  advanced  non-sm  all-cell  lung  c  ancer.  J  Clin  Oncol 
2008;26(4):650-6. 


20 


37.  Sartore-Bianchi  A,  Ricotta  R,  C  erea  G,  Maugeri  MR,  Siena  S.  Rationale  and 
clinical  results  of  multi-target  treatments  in  oncology.  Int  J  Biol  Markers  2007;22(1  Suppl 
4):S77-87. 

38.  Petrelli  A,  Giordano  S.  From  single-  to  multi-target  drugs  in  cancer  therapy:  when 
aspecificity  becomes  an  advantage.  Curr  Med  Chem  2008;15(5):422-32. 

39.  Perner  S,  Wagner  P,  Soltermann  A,  et  al.  TTF1  expression  in  non-small  cell  lung 
carcinoma:  association  with  TTF  1  gene  am  plification  and  improved  survival.  J  Pathol 
2008. 


21 


Figure  Legends 

Figure  1.  Recurrent  genomic  amplifications  of  PDGFRA  and  K/7"in  NSCL  C  samples. 
A,  smoothed  copy  number  estimates  within  chromosome  arm  4q  in  top  200  NSCL  C 
samples  (columns;  ordered  by  amplification  of  4q12).  The  color  scale  ranges  from  blue 
(deletion)  t  o  red  (amplificat  ion)  with  estim  ated  copy  num  bers  shown.  Grey  regions 
represent  the  centromere  or  absence  of  SNP  copy  number  data.  Plotted  GISTIC  G- 
scores  on  the  right  are  from  all  available  samples.  T  he  green  li  ne  on  the  GISTIC  plot 
represents  a  significance  th  reshold  of  0.25  f  alse  discovery  rate  q-value.  B,  magnified 
view  of  smoothed  copy  number  estimates  from  the  centromere  to  61  Mb  on 
chromosome  4  from  31  NSCLC  samples  having  amplification  greater  than  2.46  copies 
(log2  ratio  of  0.3)  at  4q12.  Samples  are  sorted  according  to  the  maximum  copy  number 
estimate  for  PDGFRA  and  KIT.  Solid  and  dashed  lines  indicate  positions  of  PDGFRA 
and  KIT,  respectively.  Color  scale  as  in  panel  A.  C,  FISH  for  PDGFRA  (red)  and 
chromosome  4  reference  probe  (green)  disp  laying  high-lev  el  and  lo  w-level  gain  of 
PDGFRA  in  the  lung  squamous  cell  care  inoma  samples  CTMA4  and  TMA1 48, 
respectively.  A  lung  adenocarcinoma  samp  le,  CT  MA1 1,  with  no  amplification  at 
PDGFRA  is  shown  on  the  right  for  reference.  Nuclei  are  stained  with  4,6-diamidino-2- 
phenylindole  (DAPI;  blue). 

Figure  2.  PDGFRA- amplified  NSCLC  cells  are  addicted  to  PDGFRA  activity.  A, 
PDGFRA  expression  in  NCI-H1703  and  HCC15  was  confirmed  by  immunoblotting  using 
actin  as  a  loading  control  (left  panel).  shRNA  constructs  used  to  knockdown  PDGFRA 
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expression  were  packaged  into  lentivir  us  and  used  to  infect  NCI-H1703  and  HCC15 
cells.  Anti-  PDGFRA  immunoblot  shows  that  hairpins  #1 ,  #3  and  #5  efficiently  knoc  k 
down  endogenous  PDGFRA  expression  in  NCI  -HI 703  cells.  Ac  tin  is  included  as  a 
loading  control.  Nl,  no  infect  ion.  shGFP,  control  hairpin  specific  for  green  fluorescen  t 
protein  used  as  a  negative  control  (right  panel).  B  and  C,  infection  with  three 
independent  hairpins  (#1,#3and  #5)  did  not  inhibit  cell  surviv  al  of  HCC15  cells  as 
assessed  by  WST  assay  ( B)  but  did  inhibit  survival  of  NCI-H1703  cells  over  expressing 
wild  type  PDGFRA  (C).  All  results  normalized  to  survival  of  cells  infected  with  shGFP. 


Figure  3.  PDGFRA  tyrosine  kinase  activity  is  essential  in  NSCLC  cells.  A,  PDGFRA  is 
constitutively  phosphorylated,  with  or  wit  hout  PDGF  ligand,  in  NCI-H1703  cells,  as 
compared  with  HCC15  cells.  St  imulation  with  PDGF  was  ca  rried  out  for  20  minutes  at 
37°C  with  1 00  ng/ml  of  PDGF.  T reatment  of  these  cell  lines  for  40  minutes  with  2  pM 
PDGFR  kinase  inhibitor  imatinib  and  suniti  nib  inhibits  basal  phosphor  ylation,  as 
evidenced  by  immunoblotting  wit  h  anti-phospho-PDGFRA  (upper  panel).  Similar  levels 
of  expression  of  PDGFRA  are  confirmed  by  immunoblotting  with  anti-PDGFRA  (middle 
panel)  us  ing  actin  as  a  loading  control  (lower  panel).  B  and  C,  treatment  with  the 
indicated  concentrations  of  im  atinib  and  sunitinib  inhibit  ed  survival  of  NCI-H1703  cells, 
but  not  of  HCC15  cells,  as  determined  by  WST  assay  performed  after  4  days  treatment. 


IC50s  are  indicated. 
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Supplementary  Materials 

Figure  SI.  Raw  (black)  and  smoothed  (red)  copy  number  data  for  NCI-H1703  defining 
a  4q  12  amplification  of  PDGFRA.  Estimated  copy  n  umber  values  (y  axis)  are  plotted 
according  to  position  on  chromosome  4  (x  axis).  Genomic  positions  of  SCFD2,  FIP1L1, 
LNX1,  CFIIC2,  PDGFRA,  and  KIT  are  shown  along  the  x  axis. 


Figure  S2.  Western  blot  analy  sis  of  PDGFRA  in  s  ix  different  4q  12  am  plified  (NCI- 
H1703,  NCI-H661,  NCI-H1819,  NCI-H1838,  NCI-H23  and  HCC366)  and  one  non- 
amplified  (  HCC15)  NSCLC  cell  lines.  NCI-H1703  cells  show  increased  PDGFRA 
expression  as  compared  to  other  NSCLC  cell  lines. 

Figure  S3.  Anchorage  independent  growth  of  NCI-H1 703  cells  is  dependent  on 
PDGFRA  activity.  A  and  B,  infection  with  three  indep  endent  hairpins  (#1,  #3  and  #5) 
inhibited  c  olony  formation  in  soft  agar  in  NCI-H1 703  cells  ov  er  expressin  g  wild  type 
PDGFRA,  but  not  HCC15  cells.  All  results  are  normalized  to  survival  or  colony  formation 
by  cells  infected  with  shGFP. 

Figure  S4.  Treatment  of  NCI-H 1703  cells  with  kin  ase  inhibit  ors  decreases  colony 

formation  ability.  A  and  B,  treatment  of  NCI-H  1703  cells  with  imatinib  and  sunitinib 
resulted  in  a  marked  decrease  in  colony  formation  in  soft  agar  with  IC50s  in  the  20  n  M 
and  81  nM  range,  respectively  ,  whereas  similar  treatment  of  the  FICC  15  cell  line 
without  PDGFRA  amplification  had  no  significant  effect. 
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ABSTRACT 


Background:  With  first  order  kinetics,  inflection  points  on  semilog  plots  imply  different  limiting  processes. 
Methods:  Kaplan-Meier  overall  and  progression-free  survival  curve  heights  from  selected  NSCLC  publications 
were  measured  manually.  Nonlinear  exponential  decay  was  assessed  using  GraphPad  Prism. 

Results:  Our  preliminary  observations,  if  confirmed,  would  suggest  the  following:  Palliative  fronNine 
chemotherapy:  Twelve  of  15  curves  for  untreated  controls  and  single  agents  were  fit  by  2-3-phase  decay 
models  while  42  of  48  curves  for  multidrug  regimens  were  fit  only  by  one-phase  models.  Rapid-decay-phase 
curves  were  convex  in  0%  vs  54%,  respectively  (p<0.001).  The  paucity  of  inflection  points  suggests  outcome  is 
driven  primarily  by  continuous  variables.  Hence,  individual  patient  outcome  might  be  predicted  better  using 
continuous  variables  than  by  dichotomizing  variables.  Curve  convexities  suggest  discontinuation  of 
combination  chemotherapy  after  4-6  cycles  “synchronizes”  patient  death.  Characterizing  patients  dying  along 
the  leading  convex  edge  might  identify  subgroups  that  would  benefit  from  maintenance  therapy.  Adjuvant 
Chemotherapy:  Overall  half-life  was  longer  with  adjuvant  chemotherapy  than  with  matched  controls  (p=0.03), 
apparently  more  from  a  shift  of  rapid-phase  patients  into  the  potentially-cured  slow-decay-phase  fraction  than 
from  prolongation  of  survival  of  patients  who  die  despite  therapy.  Stage:  Two-phase-decay  curves 
predominated  for  stage  ll-IV  populations.  Rapid-decay-phase  half-life  shortened  while  rapid-phase  size 
increased  with  increasing  stage  (p<0.04),  suggesting  molecular  characteristics  that  drive  tumor  cell  growth 
rates  determine  not  only  patient  survival  time  but  also  stage  at  presentation. 

Conclusions:  Future  studies  will  explore  adaptations  of  mixture  distribution  or  nonlinear  mixed  effects  modelling 
using  individual  patient  data  for  multivariate  nonlinear  exponential  decay  survival  analyses. 


INTRODUCTION: 


Survival  analyses  in  cancer  clinical  trials  generally  use  Kaplan-Meier  plots  in  reporting  survival.  Differences 
between  groups  are  generally  assessed  by  comparing  median  overall  survival  time  (OS),  progression-free 
survival  time  (PFS),  proportion  of  patients  alive  at  a  particular  time  (eg,  1  year)  or  by  calculating  hazard  ratios, 
etc. 

Biological  processes  such  as  drug  disappearance1  and  enzymatic  reactions2  may  follow  first  order  kinetics, 
with  disappearance  of  a  given  proportion  of  remaining  drug,  substance,  etc,  in  a  given  time,  rather  than  there 
being  disappearance  of  a  given  quantity  of  substance  per  unit  time.  For  processes  such  as  drug 
disappearance  that  follow  first  order  kinetics,  plotting  linear  effect  vs  time  will  give  a  curved  line,  while  plotting 
log  effect  vs  time  will  give  a  straight  line,  and  the  slope  of  the  line  can  be  used  to  calculate  the  half-life1.  In 
pharmacokinetic  (PK)  analyses,  presence  of  an  inflection  point  in  the  log-linear  curve  indicates  a  distinct 
process  driving  the  rate  of  drug  disappearance.  For  example,  the  rate  of  drug  disappearance  during  the  initial 
portion  of  the  curve  (the  “distribution  phase”)  is  driven  by  drug  uptake  into  tissues,  the  second  portion 
(following  the  first  inflection  point)  may  be  driven  by  metabolism/excretion,  and  the  third  portion  (following  a 
second  inflection  point)  may  be  driven  by  saturation  of  metabolism/excretion  processes,  by  redistribution  of 
drug  from  tissues  to  blood,  etc1.  Additional  inflection  points  and  curve  segments  may  also  occur  with  some 
biological  processes. 

We  hypothesized  that  patient  survival  variables  may  in  many  instances  also  follow  first  order  kinetics,  and  in 
these  instances,  plotting  log  %  OS,  PFS,  etc  vs  time  should  give  a  straight  line.  We  hypothesized  that 
dichotomous  variables  that  drive  prognosis  (eg,  variables  that  are  present  vs  absent  or  that  are  above  vs  below 
a  threshold)  would  give  an  inflection  point  on  a  plot  of  log  %  OS  or  PFS,  etc,  vs  time  in  the  same  manner  that 
different  processes  give  inflection  points  on  a  PK  curve,  while  continuous  variables  that  affect  prognosis  would 
alter  the  slope  of  the  survival  curve  without  giving  an  inflection  point.  If  these  hypotheses  were  correct,  then 
inflection  points  on  semilog  plots  of  survival  variables  could  give  insight  into  the  minimum  number  of 
dichotomous  variables  that  are  driving  prognosis,  and  nonlinear  regression  analyses  analogous  to 


compartmental  PK  analyses  might  permit  one  to  define  the  proportion  of  the  initial  population  accounted  for  by 
each  subgroup  and  the  half-life  of  each  distinct  subgroup. 

With  respect  to  standard  analyses  that  use  similar  approaches,  proportional  hazards  models  generally  average 
the  entire  curve,  without  deriving  specific  information  from  curve  inflection  points.  Nonlinear  mixed  effects 
modelling3  and  mixture  distribution  analyses4'6  have  been  used  to  estimate  proportion  of  patients  cured  of  a 
malignancy,  and  there  is  at  least  a  limited  experience  using  them  to  assess  impact  of  therapy  or  prognostic 
variables5, 7’8. 

In  this  manuscript,  we  used  nonlinear  regression  exponential  decay  analyses  of  Kaplan-Meier  OS  and  PFS 
curves  from  published  non-small  cell  lung  cancer  (NSCLC)  clinical  trials  as  a  preliminary  feasibility  assessment 
of  the  potential  utility  of  such  approaches  in  assessment  of  impact  of  treatment  and  prognostic  variables  on 
patient  outcome.  In  embarking  on  the  exercise,  we  anticipated  that  we  would  detect  multiple  inflection  points 
on  most  curves,  in  keeping  with  there  being  several  dichotomous  variables  driving  prognosis. 


METHODS: 

In  this  preliminary  feasibility  assessment,  we  used  OS  and  PFS  curves  from  selected  NSCLC  published  trials 
involving  front-line  chemotherapy  in  advanced  disease9'26,  adjuvant  chemotherapy  in  resected  stage  l-lll 
disease12’ 27-37  and  survival  as  a  function  of  stage10’ 12, 20, 26"30, 32"44,  and  also  used  curves  from  best  supportive 
care  arms  from  2  second  line  therapy  trials45, 46.  From  printouts  of  these  survival  curves,  height  of  curve  above 
baseline  was  measured  in  mm  for  different  time  points  from  initiation  of  therapy.  Height  for  each  time  point  was 
converted  to  a  percent  of  the  curve  height  at  time  0.  We  then  used  one-phase,  2-phase  and  3-phase 
exponential  decay  programs  in  GraphPad  Prism  version  5.0  to  model  the  data.  The  value  of  Y  at  time  =  0  was 
set  as  a  constant  at  100%  and  the  plateau  phase  (ie,  the  value  of  Y  at  time  =  infinity)  was  set  as  a  constant  at 
0%.  Curves  were  considered  to  conform  to  a  one-phase  model  rather  than  a  two-phase  model  (or  to  a  two- 
phase  model  rather  than  a  three-phase  model)  if  one  of  the  phases  accounted  for  <1%  of  the  patient 
population,  or  if  half-lives  for  two  phases  differed  by  <10%. 


Across  studies,  the  median  percent  of  patients  in  each  decay  phase  and  the  median  half-life  of  the  rapid- 
decay  phase  were  calculated  for  different  groups.  Groups  were  compared  using  Wilcoxon  signed  rank  tests 
for  matched  groups  (patients  treated  with  adjuvant  therapy  vs  control  groups  from  the  same  study),  and 
Kruskal-Wallis  testing  was  used  for  comparisons  of  non-matched  groups.  Chi-square  testing  with  Yates 
correction  was  used  to  compare  groups  with  respect  to  proportion  of  curves  fit  by  2-3  phase  decay  models  vs 
proportion  fit  by  only  one  phase  decay  models,  and  with  respect  to  proportion  of  curves  with  major  convexities. 


RESULTS: 

Some  typical  curve  shapes  are  outlined  in  Figure  1.  Of  172  OS  or  PFS  curves  analyzed,  72  (42%)  were  fit  by 
one-phase  exponential  decay  models,  92  (53%)  were  fit  by  two-phase  exponential  decay  models  (single 
inflection  point)  and  8  (5%)  were  fit  by  three-phase  exponential  decay  models  (two  inflection  points).  In  Table 
1  are  characteristics  of  exponential  decay  curves  for  different  patient  groups.  The  total  number  in  the  table 
exceeds  172  since  some  curves  were  included  both  in  assessments  of  effect  of  stage  as  well  as  assessment 
of  effect  of  therapy. 

Many  of  the  curves  had  small  shoulders  at  early  follow-up  time  points.  For  OS,  these  small  shoulders  were 
probably  related  in  part  to  selection  of  patients  with  relatively  good  performance  status.  For  PFS,  the  small 
shoulders  may  have  been  related  primarily  to  the  fact  that  first  re-evaluation  of  tumor  status  generally  didn’t 
occur  until  6-8  weeks  after  therapy  initiation.  Refitting  the  data  after  omitting  points  on  the  early  shoulder  in 
most  cases  did  not  alter  conclusions  about  number  of  curve  inflection  points  (data  not  shown). 

Overall,  41  of  the  curves  (24%)  had  substantially  more  than  just  an  initial  shoulder,  and  appeared  to  be  convex 
over  much  of  the  rapid  decay  phase.  Examples  are  presented  in  Figure  2.  Curve  characteristics  varied  with 
therapy  (Table  2).  In  patients  on  front-line  chemotherapy  trials  for  advanced  disease,  12  of  15  (80%)  OS  or 
PFS  curves  from  patients  receiving  best  supportive  care  or  single  agent  chemotherapy  could  be  fit  by  2-3 
phase  decay  models,  compared  to  only  6  of  48  (12.5%)  curves  from  patients  treated  with  regimens  involving  2 
or  more  agents  (p<0.001).  The  proportion  of  curves  fit  by  only  a  single  phase  decay  model  increased  with  the 
number  of  agents  used  in  therapy.  In  addition,  54%  of  curves  from  patients  treated  with  regimens  involving  >  2 


agents  appeared  to  have  convex  rapid  decay  phases,  compared  to  none  of  15  curves  for  patients  treated  with 
best  supportive  care  or  single  agent  therapy  (p<0.001). 

Proportion  of  patients  in  the  rapid-decay  phase  and  rapid-decay  phase  half-lives  for  different  patient  groups  are 
presented  in  Table  3.  For  2-  and  3-phase  decay  curves,  the  models  frequently  hit  constraints  with  respect  to 
the  half-life  of  the  slow-decay  phase,  and  the  slow-decay  phase  half-lives  were  generally  very  long  with  very 
wide  95%  confidence  intervals,  and  hence  are  not  presented  in  this  preliminary  analysis.  For  curves  that  could 
be  fit  by  either  a  2-  or  3-phase-decay  model,  data  from  the  2-phase-decay  model  were  used  for  Table  3  and  for 
the  accompanying  analyses.  In  therapy  of  advanced  disease,  the  rapid-decay  phase  was  larger,  but  the  alpha 
half-life  was  longer  with  regimens  involving  >  2  agents  than  with  best  supportive  care  or  with  single  agent 
therapy,  in  keeping  with  the  high  proportion  of  curves  from  patients  treated  with  multi-agent  regimens  that 
could  be  fit  with  only  one-phase  decay  models. 

With  adjuvant  chemotherapy,  there  was  a  smaller  PFS  rapid-decay  phase  with  adjuvant  chemotherapy  than  in 
control  groups  and  a  trend  towards  a  smaller  OS  rapid-decay  phase.  Median  alpha  half-life  (ie,  half-life  of  the 
rapid-decay  phase)  was  slightly  longer  with  adjuvant  chemotherapy  for  both  OS  and  PFS,  although  this  was 
not  statistically  significant.  When  one-phase  exponential  decay  half-lives  were  calculated  for  OS  and  PFS  from 
curves  in  studies  of  adjuvant  chemotherapy  vs  matched  controls,  both  OS  and  PFS  half-lives  were  significantly 
longer  in  the  adjuvant  groups  than  in  the  matched  control  groups  (Table  4). 

With  respect  to  stage,  6  of  10  (60%)  OS  curves  from  stage  I  untreated  patients  were  best  fit  by  one-phase 
decay  models,  compared  to  4  of  22  (18%)  OS  curves  from  stage  ll-IV  untreated  patients  (Table  1).  Hence, 
stage  I  curves  tended  to  be  characterized  by  one-phase  decay  with  very  long  half-life.  For  stages  ll-IV,  the 
proportion  of  patients  in  the  rapid  decay  phase  increased  and  the  half-life  of  the  rapid  decay  phase  decreased 
with  increasing  stage  (Table  4). 


DISCUSSION: 


This  preliminary  assessment  suggests  that  it  may  be  feasible  to  use  nonlinear  exponential  decay  analysis  to 
assess  patient  survival  variables,  and  it  also  suggests  that  specific  hypotheses  may  be  generated  by  this 
approach,  such  as  the  ones  outlined  below.  It  is  stressed  that  substantially  more  work  will  be  needed  to 
determine  whether  or  not  our  observations  were  driven  solely  by  methodology-related  artefact,  but  the  results 
suggest  that  adaptations  of  procedures  such  as  mixture  distribution  analyses  and  nonlinear  mixed  effects 
modelling  to  permit  nonlinear  exponential  decay  analysis  of  censored  individual  patient  data  could  potentially 
provide  useful  insights  that  might  not  be  as  apparent  with  more  usual  survival  analysis  approaches. 

We  had  expected  that  we  would  routinely  detect  multiple  inflection  points  on  the  curves,  and  we  found 
substantially  fewer  inflection  points  than  anticipated.  The  sparseness  of  curve  inflection  points  suggests  to  us 
that  most  prognostic  variables  function  as  continuous  variables  affecting  curve  slope,  rather  than  functioning  as 
dichotomous  variables  that  slot  a  patient  into  a  specific  patient  subgroup.  Hence,  apparently  dichotomous 
prognostic  variables  like  gender  may  simply  be  surrogates  for  various  continuous  variables.  Even  with 
prognostic  variables  that  are  known  to  be  continuous,  it  is  common  practice  to  dichotomize  them  around  a  cut- 
point.  While  this  dichotomization  may  be  useful  in  helping  identify  factors  with  prognostic  significance,  the 
paucity  of  survival  curve  inflection  points  would  lead  us  to  hypothesize  that  models  that  use  continuous 
variables  would  do  a  better  job  of  predicting  outcome  of  individual  patients  than  would  models  that  dichotomize 
prognostic  variables.  For  example,  in  NSCLC,  it  may  be  useful  to  consider  actual  tumor  size  rather  than 
whether  it  is  T1  (<3  cm  diameter)  or  T2  (>  3cm),  to  consider  number  and  bulk  of  nodes  involved  rather  than 
just  grouping  them  as  NO  to  N3,  and  to  explore  use  of  some  measure  of  hormonal  status  rather  than  simply 
grouping  patients  as  male  vs  female.  Clinicians  generally  prefer  to  have  simple  “yes-no”  rules  in  deciding 
management  approaches,  but  it  may  be  time  to  consider  moving  beyond  this. 

The  higher  proportion  of  advanced  disease  studies  with  curves  conforming  to  one-phase  decay  curves  when  > 
2  drugs  are  used  compared  to  when  0-1  drugs  are  used  and  when  compared  to  assessments  of  impact  of 
stage  or  effect  of  adjuvant  chemotherapy  for  early  stage  disease  could  have  arisen  by  chance,  by  inclusion  of 
more  than  one  curve  from  some  studies,  or  through  observer  bias.  However,  one  biologically  plausible 


hypothesis  that  could  explain  it  is  that  the  routine  practice  of  discontinuing  chemotherapy  after  3-6  cycles  may 
be  synchronizing  patient  death.  Randomized  trials  have  generally  failed  to  identify  a  benefit  of  continuing 
chemotherapy  beyond  this  point47'50,  but  it  is  possible  that  there  may  be  specific  subpopulations  that  would 
benefit,  and  some  studies  of  maintenance  chemotherapy  have  suggested  that  it  may  be  of  benefit  in  some 
patients51.  It  would  be  of  interest  to  assess  tumor  molecular  characteristics  and  clinical  features  for  patients 
dying  along  the  leading  edge  of  the  convexity  to  determine  if  one  might  identify  such  a  specific  subpopulation 
that  would  benefit  from  maintenance  chemotherapy. 

Adjuvant  chemotherapy  was  associated  with  longer  OS  and  PFS  half-lives  compared  to  controls  when  only 
one-phase  exponential  decay  models  were  used,  in  keeping  with  randomized  trials27,37  and  meta-analyses52 
that  indicate  a  benefit  of  adjuvant  chemotherapy  in  resected  NSCLC.  When  we  used  for  each  study  the  model 
with  the  largest  number  of  phases  that  could  be  fit  successfully,  adjuvant  therapy  (compared  to  untreated 
matched  controls)  was  associated  with  a  significant  reduction  in  the  proportion  of  patients  in  the  rapid  decay 
phase  for  PFS  and  a  similar  trend  for  OS.  This  suggests  that  the  adjuvant  chemotherapy  is  actually  shifting 
patients  into  the  cured  fraction  and  not  just  prolonging  survival.  The  slight  trend  towards  prolongation  of  the 
alpha  half-life  would  suggest  that  it  also  may  be  somewhat  prolonging  survival  of  patients  who  are  not  cured. 

While  proportion  of  patients  in  the  rapid-decay  phase  decreased  from  stage  II  to  stage  IV,  it  was  high  in  stage  I 
patients.  This  is  probably  due  to  insufficient  follow  up  time  in  several  of  the  stage  I  studies  to  permit  detection 
of  a  slower  decay  phase.  The  shortening  of  the  alpha  half-life  as  one  goes  from  stage  I  to  stage  IV  disease 
and  the  increase  in  proportion  of  patients  in  the  rapid  decay  phase  as  one  goes  from  stages  II  through  IV 
suggests  that  molecular  characteristics  associated  with  rapid  tumor  cell  growth  increase  the  proportion  of 
patients  who  are  destined  to  die  of  disease  while  at  the  same  time  decreasing  survival  time  of  those  who 
eventually  die  of  NSCLC.  Hence,  one  might  hypothesize  that  a  patient  with  recurrent  stage  I  NSCLC  would 
have  more  indolent  disease  than  would  a  patient  with  equal  bulk  disease  that  was  stage  IV  at  presentation, 
and  that  the  two  would  tend  to  have  different  molecular  characteristics  and  possibly  different  treatment 
susceptibilities.  In  addition,  this  would  suggest  that  molecular  characteristics  drive  both  prognosis  and  stage- 
ie,  a  patient  with  relatively  indolent  disease  might  tend  to  have  it  discovered  when  it  was  still  in  an  early  stage 


since  it  would  stay  at  an  early  stage  longer,  while  patients  with  more  rapidly  growing  disease  would  be  less 
likely  to  have  the  disease  discovered  by  chance  while  it  was  still  early  stage.  Hence,  this  is  in  keeping  with  the 
concept  that  stage  at  presentation  is  a  surrogate  for  tumor  cell  growth  rate  in  addition  to  being  a  surrogate  for 
presence  of  micrometastatic  disease,  and  that  tumor  cell  molecular  characteristics  will  eventually  supplant 
stage  as  the  important  determinant  of  tumor  management  strategies. 

Adaptations  of  methods  such  as  mixture  distribution  or  nonlinear  mixed  effects  modelling  to  assess  exponential 
survival  decay  using  individual  patient  data  could  prove  useful  in  a  variety  of  ways.  Instead  of  just  assessing 
the  impact  of  a  prognostic  or  treatment  variable  on  outcomes  such  as  median  survival  or  percent  survival  at  a 
specific  time,  these  approaches  could  potentially  be  used  to  assess  impact  of  the  variables  on  a  variety  of 
individual  outcome  components  such  as  proportion  of  patients  shifted  from  a  poor  outcome  groups  to  better 
outcome  groups,  half-life  of  each  subgroup,  etc.  It  could  also  be  used  to  estimate  maximum  achievable 
survival  time  for  members  of  each  subgroup,  proportion  of  the  total  population  accounted  for  by  each  subgroup 
at  different  time  points  along  the  survival  curve,  and  time  beyond  which  one  may  have  a  relatively 
homogeneous  population  of  good  prognosis  patients.  This  ability  to  predict  the  point  at  which  the  population 
becomes  homogeneous  could  be  particularly  useful  in  helping  identify  molecular  factors  associated  with  good 
prognosis.  We  plan  to  explore  this  further  using  individual  patient  data. 


Table  1.  Characteristics  of  exponentia 

decay  curves  for  different  patient  groups 

No.  Curves 

1  phase  decay 

2  phase  decay 

3  phase  decay 

Rapid  Phase 

Curve  Convexity3 

Yes 

No 

Front  Line  Chemotherapy  for  Advanced  Disease: 

Overall  survival: 

Best  supportive  careb 

1 

4 

1 

0 

6 

Single  agent 

2 

3 

0 

0 

5 

Two-drug  regimen 

22 

4 

2 

13 

15 

>  Three-drug  regimen 

7 

0 

0 

5 

2 

Progression-free  survival: 

Best  supportive  carec 

0 

2 

0 

0 

2 

Single  agent 

0 

1 

0 

0 

1 

Two-drug  regimen 

11 

0 

0 

6 

5 

>  Three-drug  regimen 

2 

0 

0 

2 

0 

Adjuvant  Chemotherapy  for  Stage  l-lll  Resected  Disease: 

Overall  survival: 

Adjuvant  chemotherapy 

6 

17 

1 

5 

19 

Control 

8 

12 

1 

3 

18 

Progression-free  survival: 

Adjuvant  chemotherapy 

1 

11 

2 

1 

13 

Control 

1 

11 

0 

0 

12 

Staged: 

Overall  survival: 

Stage  1 

6 

5 

0 

1 

10 

Stage  l-lle 

0 

1 

0 

0 

1 

Stage  l-llle 

1 

8 

0 

0 

9 

Stage  II 

1 

6 

1 

2 

6 

Stage  III 

2 

6 

2 

2 

8 

Localized,  incomplete  resection 

0 

1 

0 

0 

1 

Stage  IV 

1 

4 

1 

1 

5 

Progression-free  survival: 

Stage  1 

1 

1 

0 

0 

2 

Stage  l-lle 

0 

1 

0 

0 

1 

Stage  l-llle 

0 

2 

0 

0 

2 

Stage  II 

0 

1 

0 

0 

1 

Stage  III 

0 

2 

0 

0 

2 

Localized,  incomplete  resection 

0 

5 

0 

0 

5 

Stage  IV 

0 

2 

0 

0 

2 

Other  subgroup  analyses: 

Overall  survival 

10 

10 

1 

3 

18 

Progression-free  survival 

0 

2 

0 

0 

2 

a.  Convexity  that  is  more  that  simply  a  shoulder  on  the  initial  part  of  the  curve 

b.  Includes  2  curves  from  best  supportive  care  arms  of  studies  of  second  line  therapies 

c.  Includes  1  curve  from  best  supportive  care  arm  of  a  study  of  second  line  therapy 

d.  From  publications  on  survival  vs  stage  and  from  control  arms  of  adjuvant  studies  and  best  supportive 
care  arms  of  studies  of  chemotherapy  for  advanced  disease 

e.  Not  broken  down  by  individual  stage 

Table  2.  Effect  of  therapy  c 

etails  on  curve  characteristics 

No.  drugs 

No.  curves 

No.  with  2-3  phase  decay3 

No.  with 
convexity3 

0 

8 

7 

0 

1 

7 

5 

0 

2 

39 

6 

19 

>3 

9 

0 

7 

a.  p  <0.001  for  0-1  drugs  vs  >  2  drugs  (Chi-square  with  Yates  correction) 

Table  3.  Comparison  of  proportion  of  patients  in  rapid-decay  phase  and  rapid-decay-phase  half-lives  across 
groups. 

Group 

No. 

studies 

%  Rapid  Decay 

P 

Alpha  half-life 

P 

Median 

Range 

median 

range 

First  line  chemotherapy  for  advanced  NSCLC: 

OS:  Best  supportive 
care 

6 

96.7 

82.6-100 

0.006a,b 

4.4 

3. 8-4.7 

0.0008a,b 

OS:  Single  Agent 

5 

94.8 

66.4-100 

4.9 

1. 5-7.1 

OS:  Two  Drugs 

28 

100 

12.9-100 

7.5 

2.5-11.5 

OS:  >  Three  drugs 

7 

100 

100-100 

6.8 

4.6-12.2 

PFS:  Best 
supportive  care 

2 

94.5 

91.9-97.1 

0.0019a,b 

2.7 

2.4-3. 0 

0.06a,b 

PFS:  Single  Agent 

1 

14.6 

- 

2.2 

- 

PFS:  Two  Drugs 

11 

100 

100-100 

3.9 

3 .2-5.2 

PFS:  >  Three  drugs 

2 

100 

100-100 

4.5 

3. 6-5.4 

Adjuvant  chemotherapy  vs  control6: 

OS  adjuvant  chemo 

24 

89.3 

1.3-100 

0.12c,d 

63.7 

4.4- 

245.0 

O.47C4 

OS  controls 

24 

94.4 

43.9-100 

45.1 

7.0- 

339.0 

PFS  adjuvant 
chemo 

13 

68.2 

29.9-100 

0.02c,d 

12.6 

5.0-58.2 

0.54c,d 

PFS  controls 

13 

85.0 

44.0-100 

10.9 

5.1-98.6 

Survival  by  Stage1: 

OS:  Stage  I 

11 

100 

40.6-100 

0.23a,b 

(0.04a,h) 

105.6 

15.3-339 

0.0001a’b 

(0.0004a,h) 

OS:  Stage  I-IIg 

1 

65.6 

- 

34.8 

- 

OS:  Stage  I-IIIg 

9 

80.8 

4.3-100 

12.6 

6.6-59.3 

OS:  Stage  II 

8 

79.1 

49.4-100 

19.1 

12.7- 

58.5 

OS:  Stage  III 

10 

92.5 

62.8-100 

11.5 

6.6-33.6 

OS:  Stage  IV 

6 

96.9 

82.9-100 

4.3 

3. 1-5.9 

PFS:  Stage  I 

2 

87.3 

74.6-100 

0.19a,b 

69.4 

40.1- 

98.6 

0.13a,b 

PFS:  Stage  I-IIS 

1 

53.5 

- 

11.0 

- 

PFS:  Stage  I-IIIg 

2 

33.7 

23.3-44.0 

8.5 

6.1-10.9 

PFS:  Stage  II 

1 

72.0 

- 

13.2 

- 

PFS:  Stage  III 

2 

77.5 

77.5-85.0 

8.0 

6.0-10.0 

PFS:  Stage  IV 

2 

94.5 

91.9-97.1 

2.7 

2.4-3. 0 

a.  Kruskal- Wallis 

b.  comparison  across  groups 

c.  Wilcoxon  signed  rank  test 

d.  vs  matched  control  group 

e.  Total  numbers  differ  from  Table  1  since  control  arms  were  included  more  than  once  here  if  compared  to  more 
than  one  chemotherapy  arm  from  the  same  trial,  while  being  included  only  once  in  Table  1.  Total  numbers 
differ  from  Table  3  since  Table  3  included  only  single  OS  or  PFS  curves  from  each  trial,  while  Table  4  also 
includes  curves  from  subgroup  analyses. 

f.  From  studies  on  effect  of  stage  or  on  untreated  control  arms  from  adjuvant  therapy  studies  and  studies  of 
therapy  for  advanced  disease 

g.  from  studies  for  which  data  were  not  broken  down  by  individual  stages 

h.  comparison  across  stages  II  to  IV _ 


Table  4.  Overall  and  progression-free  survival  half-lives  with  adjuvant  chemotherapy  for  stage  I 

-III  disease8 

Group 

No.  Studies 

Overall  half-life 

P 

Median 

Range 

Overall  survival: 

Adjuvant  chemotherapy 

13 

61.9 

20.2-299.0 

0.0002b 

Matched  control 

13 

48.6 

7.0-226.0 

Progression-free  survival: 

Adjuvant  chemotherapy 

6 

69.4 

17.0-221.0 

0.03b 

Matched  control 

6 

54.7 

8.8-98.6 

a.  Curves  for  adjuvant  therapy  compared  to  matched  control  curves,  using  half-lives  derived  from  one- 

phase  exponential  decay  models 

b.  Wilcoxon  signed  rank  test  comparing  adjuvant  therapy  curve  to  matched  control  curve 

Figure  1.  Typical  one-three  phase  decay  survival  curves  showing  both  semilog  plots  and  the  corresponding 
linear  plots 

Figure  2.  Examples  of  semilog  plots  with  convex  rapid  phases  (with  and  without  a  possible  second  phase),  and 
corresponding  linear  plots 


Figure  1.  Representative  curve  types  (linear  vs  semilog  plots) 
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Figure  2.  Examples  of  curves  with  convex  rapid  phase 
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Abstract: 

Background:  Non-small  cell  lung  cancer  (NSCLC)  is  a  heterogeneous  disease,  with  a  variety  of  signaling 
pathways  driving  disease  progression  and  therapeutic  resistance.  To  better  understand  the  role  of  cytokines  and 
angiogenic  factors  in  these  pathways,  we  performed  in  vitro  profiling  of  proteins  secreted  by  NSCLC  tumor  cells. 
Methods:  Using  multiplex  bead  assay,  43  cytokines  and  angiogenic  factors  (CAF)  were  measured  in  conditioned 
media  from  forty,  subconfluent  NSCLC  cell  lines.  Unsupervised  clustering  of  all  the  factors  was  performed  to 
identify  CAF  signatures  among  the  cell  lines.  Individual  CAF  levels  were  then  correlated  with  the  cell  lines' 
mutation  status  (EGFR  and  K-Ras  mutated  versus  wild  type)  and  sensitivity  to  various  chemotherapies  and 
targeted  agents  (as  determined  by  the  concentration  required  to  inhibit  growth  by  50%)  using  two-sample  t-test. 
Results:  Unsupervised  clustering  of  the  43  CAF  levels  for  the  NSCLC  cell  lines  revealed  at  least  two  distinct  CAF 
signatures  among  the  cell  lines.  Individual  CAFs  (ex.,  FGF,  IL-10,  MIP-lb)  were  significantly  correlated  with 
EGFR  and  K-Ras  mutational  status,  with  p-values  <  0.05  by  t-test  comparing  mutated  to  wild  type  cell  lines. 
Additionally,  certain  factors  were  associated  with  sensitivity  to  specific  drugs.  For  example,  sensitivity  to  the 
EGFR  inhibitors,  erlotinib  and  gefitinib,  was  associated  with  elevated  Gro-alpha  and  low  IL-12;  while  docetaxel 
sensitivity  was  associated  with  higher  TNF-beta.  Conclusions:  This  exploratory  analysis  demonstrates  that 
NSCLC  cell  lines  have  distinct  patterns  of  protein  secretion  and  are  associated  with  response  to  treatment.  These 
results  are  being  further  investigated  in  clinical  samples  from  patients  treated  with  these  agents  as  potential 
predictive  markers  of  treatment  response.  (Supported  by  P50  CA70907,  W81XWH-07-1-0306  01,  and  W81XWH- 
06-1-0303) 
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Abstract: 

Background:  Pts  with  early  stage  NSCLC,  especially  with  nodal  disease,  have  a  poor  prognosis  despite  curative 
intent  therapy.  It  is  unclear  which  pts  may  derive  the  benefit  of  chemotherapy.  The  primary  endpoint  of  the  study 
was  to  assess  the  tolerability  of  the  regimen.  In  addition,  the  clinical  response  of  the  chemotherapy  regimen  as 
well  as  tumor  biomarkers  modulation  will  be  examined.  Methods:  Pts  had  previously  untreated,  potentially 
surgically  resectable,  stage  l-lll  NSCLC  with  ECOG  performance  status  (PS)  0-1  and  adequate  laboratory 
parameters.  After  baseline  tissue  was  obtained,  chemotherapy  was  administered  (docetaxel  [T]  75  mg/m2  and 
cisplatin  [P]  80  mg/m2  every  3  wks)  for  3  cycles.  Subsequently,  pts  underwent  restaging,  then  planned  definitive 
therapy  with  surgical  resection.  Pts  were  then  offered  treatment  for  1  year  with  erlotinib  (E)  150  mg  PO  daily. 
Bronchoscopic  biopsies  were  performed  at  6  months  and  1  year  post-  surgery.  Results:  41  pts  were  enrolled 
between  2/07  and  1 1/08.  3  were  not  eligible  and  did  not  receive  treatment.  Of  the  38  eligible  pts:  median  age  was 
65  years  (42-80);  24  (63.2%)  were  male;  26  (68%)  were  PS  1.  Stage  IB  18%  (7),  MB  37%  (14),  IIIA  39%  (15),  NIB 
5%  (2).  31  pts  completed  all  3  cycles  (35  pts  completed  at  least  2  cycles).  32  pts  underwent  definitive  surgical 
resection  with  1  pt  pending  for  surgery.  5  others  did  not  undergo  surgery:  pneumonia  (1),  progressive  disease  (1), 
definitive  chemo-radiation  (3).  For  pts  completing  at  least  2  cycles  of  chemotherapy,  the  radiographic  response 
rate  was  57%  (20)  by  RECIST  criteria  with  40%  (14)  having  stable  disease.  1  pt  had  a  complete  pathologic 
response.  16  pts  have  started  adjuvant  E,  4  have  completed  1  yr  of  treatment.  Grade  3/4  toxicities  included 
neutropenia  (6  pts)  and  hypokalemia  (4  pts).  Blood  and  tissue  specimens  will  be  analyzed  to  assess  sensitivity  to 
chemotherapy.  Conclusions:  Neoadjuvant  T  and  P  is  a  tolerable  and  active  regimen  with  an  encouraging 
response  rate  in  stage  I-  III  resectable  NSCLC.  In  addition  to  clinical  characteristics,  determining  which  patients 
will  benefit  from  chemotherapy  by  analyzing  their  tumor  biomarkers  may  help  improve  overall  outcomes  of 
curative  lung  cancer  pts.  Supported  by  grant  DoD#  W81XWH-07-1-0306. 
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Secreted  Cytokine  and  Angiogenic  Factor  (CAF)  profiles  associated  with 
age  and  sex  in  NSCLC. 

Matthew  H.  Herynk1,  Emer  Hanrahan1,  Heather  Yan  Lin2,  Tina  Cascone1, 

Shaoyu  Yan3,  Lauren  Byers4,  John  Yordy5,  J.  Jack  Lee2,  Hai  T.  Tran1,  and  John 
V  Heymach1. 

Departments  of  Thoracic/Head  and  Neck  Medical  Oncology1,  Biostatistics  and 
Mathematics2,  Pharmacy  Pharmacology  Research3,  Cancer  Medicine4,  and 
Radiation  Oncolgy5. 

Background:  Subgroup  analyses  from  recent  clinical  trials  in  non-small  cell  lung 
cancer  (NSCLC)  suggest  therapeutic  efficacy  in  a  sex-specific  manner  from 
drugs  such  as  bevacizumab  and  vandetanib.  These  differences  suggest  that 
factors  inherent  in  the  basic  male/female  biology  may  impact  growth  and  survival 
mechanisms  in  NSCLC  tumors.  We  sought  to  identify  if  there  are  sex-specific 
differences  in  secreted  cytokine  and  angiogenic  factors  (CAFs)  in  NSCLC  cell 
lines  and  patient  samples. 

Methods:  Thirty-five  CAFs  were  measured  by  multiplex  bead  suspension  arrays 
(MBSA)  and  ELISAs  from  pre-treatment  plasma  (N=123)  and  serum  (N=151) 
collected  from  patients  with  stage  IIIB/IV  NSCLC  participating  in  a  randomized 
phase  2  trials  of  vandetanib  alone  or  in  combination  with  chemotherapy.  MBSA 
were  used  to  measure  the  levels  of  48  secreted  CAFs  in  conditioned  media  from 
36  NSCLC  cell  lines  (female  N=17,  male  N=19).  Subconfluent  cells  were  serum- 
starved  overnight  and  the  media  was  changed,  24  hours  later,  conditioned  media 
was  collected  and  the  cells  were  lysed.  Measured  CAF  levels  were  normalized 
to  total  protein  from  whole  cell  lysates. 

Results:  Univariate  analysis  of  serum  and  plasma  samples  revealed  statistically 
significant  differences  in  the  concentrations  of  18  CAFs  between  male  and 
female  patients  with  most  being  higher  in  females  including;  plasma  IL-15  (mean 
1193  vs.  291  pg/ml;  P  =0.0009),  slL-2R  (mean  1413  vs.  577  pg/ml;  P  =0.004), 
MIG  (CXCL-9)  (mean  184  vs.  67  pg/ml;  P  =0.0007),  and  macrophage 
inflammatory  protein-1  (MIP-1  alpha,  CCL3)  (mean  319  vs.  108  pg/ml;  P 
=0.0067).  Conditioned  media  from  36  NSCLC  cell  lines  was  analyzed  for  levels 
of  secreted  CAFs.  Nine  CAFs  determined  to  be  statistically  significant  in  patient 
samples  were  also  present  in  the  cell  line  analyses  and  two  factors,  MIP-lalpha 
and  intracellular  adhesion  molecule-1  (ICAM-1)  also  demonstrated  increased 
levels  in  female  versus  male  cell  lines,  but  these  differences  did  not  reach 
statistical  significance.  While  18  CAFs  were  statistically  significant  in  patient 
samples,  no  individual  factors  were  statistically  significant  in  conditioned  media 
from  cell  lines.  Subgroup  analysis  of  female  cell  lines  revealed  an  age 
association  with  26  secreted  CAFs  in  NSCLC  cell  lines.  The  majority  were 
upregulated  in  cell  lines  originally  derived  from  patients  >50  y/o  (N=10)  vs  <50 
(N=5)  including  IL-15  (2.05  vs.1.27  pg/ml,  P=0.011),  MIG  (0.12  vs.  0.095  pg/ml, 


P=0.033),  EGF  (14.21  vs.  11.25  pg/ml,  P=0.034),  and  ICAM-1  (11.08  vs.  7.66 
pg/ml  P=0.057). 


Conclusions:  Significant  CAF  differences  were  observed  when  male  and  female 
patient  samples  and  conditioned  media  from  cell  lines  were  analyzed,  thus 
suggesting  an  important  role  for  age  and  sex  in  the  secreted  CAF  profiles  of 
NSCLC.  Because  EGFR  inhibitors  have  shown  preferential  efficacy  for  females, 
and  hormone  signaling  varies  between  male  vs.  female  populations  as  well  as 
between  younger  vs.  older  women,  the  contributions  of  EGFR  and  hormone 
signaling  on  the  sex-different  secreted  factors  is  being  further  investigated. 

These  secreted  factors  are  involved  in  a  number  of  signaling  networks  and  thus 
may  contribute  to  a  broad  range  of  effects  on  tumor  growth,  metastases,  and 
therapeutic  efficacy  of  angiogenesis  inhibitors  and  other  targeted  agents. 
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Development  of  a  Quantum  Dots  (QDs)-based  Quantification  Method  for 
Multiplexed  Biomarkers  in  Prediction  of  Metastasis 

Dong-hai  Huang1,  Clifford  C.  Hoyt2,  Xiang-hong  Peng1,  Dongsheng  Wang1,  Hongzheng  Zhang1, 
Fadlo  R.  Khuri1,  Dong  M.  Shin1,  Zhuo  (Georgia)  Chen1 

1 .  Department  of  Hematology/Oncology,  Winship  Cancer  Institute,  Emory  University,  Atlanta,  GA 
2.  Cambridge  Research  &  Instrumentation  (CRi),  Inc.,  Woburn,  MA 

Quantum  dots  (QDs)  are  semiconductor  nanoscale  particles  with  novel  optical 
properties  well-suited  to  multiplexed  immunostaining  of  formalin-fixed  paraffin 
embedded  (FFPE)  tissues.  In  this  study,  we  report  development  of  a  novel 
quantification  method  that  utilizes  QDs,  multispectral  imaging  and  advanced  image 
analysis  based  on  “machine  learning”,  to  provide  per-cell,  flow  cytometry-like 
quantification  of  protein  expression  levels  in  cancer  cells  in  intact  tissue  sections.  We 
look  at  three  proteins,  EGFR,  E-cadherin  (E-cad)  and  (3-catenin  (p-cat),  in  lung  and 
head  and  neck  cancers.  QD  secondary  antibody  conjugates  (emitting  at  605,  705,  and 
655  nm)  were  used  to  detect  protein  expression  levels,  which  were  evaluated  for 
correlation  with  clinical  characteristics.  Method  development  and  validation  included:  (1) 
the  comparison  of  single  biomarker  detection  using  conventional  immunohistochemistry 
(IHC)  with  the  same  except  using  QD-based  immunohistofluorescence  (IHF);  and  (2) 
the  comparison  of  biomarker  signals  from  samples  stained  with  single  QD  IHF  in  serial 
sections  with  biomarker  signals  from  the  same  proteins  but  from  samples  stained 
simultaneously  with  multiple  QD  IHF.  FFPE  tissue  sections  from  30  head  and  neck 
cancer  cases  and  20  lung  cancer  cases  were  used  for  the  validation.  Both  Pearson’s 
and  Spearman’s  tests  show  significant  correlation  between  IHC  and  QD-IHF  for  the 
single  marker  staining  tests  (EGFR:  correlation  coefficient  ^=0. 8-0.9,  p<  0.00001;  E- 
cad:  r2=  0.9,  p<0. 00001;  p-cad:  ^=0.7-0. 8,  p<0. 00001),  and  for  the  single-plex  versus 
multiplex  tests  (EGFR:  ^=0.8-0. 9,  p<0. 00001;  E-cad:  ^=0.8,  p<0. 00001;  p-cad:  ^=0.7- 
0.8,  p<0. 00001).  Images  of  the  30  head  and  neck  FFPE  samples,  (which  consisted  of 
ten  non-metastatic  primary  tumors  (TuMet'),  ten  metastatic  primary  tumors  (TuMet+),  and 
ten  matched  lymph  node  metastasis  (LNM))  were  acquired  with  a  CRi  multispectral 


camera  and  analyzed  with  CRi  advanced  machine  learning-based  software  for 
multiplexed  quantification.  A  weighted  index,  defined  as  a  result  that  at  least  two  of  the 
three  biomarkers  express  above  or  below  defined  thresholds  in  TuMet+  samples,  was 
tested  for  predictive  power  of  LNM.  Cut-off  thresholds  of  E-cad  <  53,  EGFR  <  65  and  13- 
cat  >40  were  used.  In  the  current  study,  the  positive  predictive  value  (PPV)  of  the 
weighted  index  is  77.8%,  the  negative  predictive  value  (NPV)  is  72.7%,  the  sensitivity  is 
70%  and  the  specificity  is  80%.  In  summary,  a  quantification  system  of  multiplexed 
biomarkers  using  QD-IHF  has  potential  applications  in  prediction  of  LNM  and  validation 
and  monitoring  of  the  outcome  of  anticancer  therapies.  (Supported  by  grants  from  NIH 
R21  CA125062,  DOD  W81XWH-07-1-0306  Project  5,  and  GCC  Distinguished  Scholar 
Award  to  ZC). 


Immunohistochemical  Expression  of  Membrane  Transporters  Correlates  with  Histology 
of  Non-Small  Cell  Lung  Carcinoma.  Maria  Nunez,  Carmen  Behrens,  Heather  Lin, 
Ludmila  Prudkin,  Milind  Suraokar,  Denise  M.  Woods,  Luc  Girard,  John  Minna,  Jack  Lee, 
Wayne  Hoftetter,  Wilbur  Franklin,  Cesar  A.  Moran,  Wilbur  Franklin,  Waun  Ki  Hong, 
David  Stewart,  Ignacvio  I.  Wistuba. 

Membrane  transporters  Folate  receptor  alpha  (FOLR1),  Reduced  folate  carrier  1  (RFC1), 
Copper  transporter  receptor  1(CTR1),  Glucose  4  (GLUT4)  and  RHOA  regulate  uptake  of 
molecules  and  drugs  inside  the  cell.  FOLR1  and  RFC1  are  over  expressed  in  epithelial 
tumors  and  are  potential  therapeutic  targets  and  tumor  biomarkers;  however  there  is 
limited  information  on  the  expression  of  these  receptors  in  non-small  cell  lung  carcinoma 
(NSCLC). 

Immunohistochemical  (IHC)  protein  expression  of  FOLR1,  RFC1,  CTR1,  GLUT4  and 
RHOA  was  examined  in  320  surgically  resected  NSCLCs  placed  in  tissue  microarrays, 
including  202  adenocarcinomas  and  110  squamous  carcinomas,  and  correlated  with 
patients’  clinico-pathological  characteristics.  A  semiquantitative  IHC  score  was  obtained 
assessing  intensity  of  immunostaining  and  percentage  of  positive  tumor  cells. 

The  pattern  of  IHC  expression  varied  in  malignant  cells,  with  FOLR1,  RFC1  and 
GLUT4  expressed  in  the  membrane  and  cytoplasm,  CTR1  expressed  in  the  cytoplasm 
and  nucleus,  and  RHOA  expressed  only  in  the  cytoplasm.  In  all  cases  expression  in  tumor 
cells  was  higher  than  in  non-malignant  lung  epithelial  cells.  Tumor  stroma  IHC 
expression  was  frequently  detected,  especially  in  endothelial  cells,  lymphocytes, 
macrophages  and  fibroblasts.  Adenocarcinomas  showed  significantly  higher  expression 
compared  with  squamous  cell  carcinoma  for  most  markers,  including  membrane 
(PO.OOl)  and  cytoplasmic  (PO.OOl)  FOLR1,  cytoplasmic  (PO.OOl)  and  nuclear 
(PO. 004)  CTR1,  and  cytoplasmic  RHOA  (PO.OOl).  Female  NSCLC  patients  had 
significantly  higher  expression  of  membrane  and  cytoplasmic  FOLR1  (PO.Ol)  compared 
with  male  patients.  Ever  smoker  patients  demonstrated  significantly  lower  expression  of 
membrane  (P<0.001)  and  cytoplasmic  FOLR1  (P0.002),  and  higher  expression  of 
membrane  (P=0.04)  and  cytoplasmic  (P=0.03)  GLUT4,  and  membrane  RFC1  (P=0.01), 
compared  with  never  smokers.  In  adenocarcinomas,  the  presence  of  EGFR  mutations 
correlated  with  higher  expression  of  membrane  FOLR1  (P0.002),  and  KRAS  mutation 
with  higher  expression  of  membrane  GLUT4  (P0.004)  and  lower  expression  of  nuclear 
CTR1  (P=0.02).  Finally,  squamous  carcinomas  showed  higher  positive  endothelial  cell 
expression  of  FOLR1  (P=0. 00001)  than  adenocarcinomas. 

We  conclude:  1.  membrane  transporters  proteins  are  over  expressed  in  NSCLC  compared 
to  normal  lung  epithelium;  2.  significant  differences  were  found  between 
adenocarcinomas  and  squamous  lung  cancer  in  both  tumor  cells  and  the  tumor 
microenvironment;  3.  differences  were  found  in  tumors  of  males  and  females,  between 
tumors  from  never  and  ever  smokers,  and  tumors  with  EGFR  or  KRAS  mutations.  The 
different  patterns  of  transporter  expression  may  explain  the  superior  response  of  NSCLC 
patients  with  adenocarcinoma  histology  to  pemetrexed.  Supported  by  grants  US  DoD 
W81XWH-07- 1-0306,  and  UT-Lung  SPORE  P50CA70907 
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Abstract: 

Background:  The  optimal  multi-modality  treatment  for  resectable  malignant  pleural  mesothelioma  (MPM) 
remains  unknown.  We  designed  a  biomarker-based  neoadjuvant  trial  from  our  preclinical  studies  showing  that 
dasatinib,  a  multi-targeted  Src  kinase  inhibitor,  has  activity  against  MPM  and  target  specificity  to  Src  Tyr419. 
Methods:  Untreated  MPM  patients  underwent  extended  surgical  staging  (ESS)  with  multiple  biopsies  to  account 
for  tumor  heterogeneity,  lymph  node  status  and  to  rule  out  sarcomatoid  features.  If  deemed  a  surgical  candidate 
for  either  pleurectomy/decortication  (P/D)  or  extrapleural  pneumonectomy  (EPP),  patients  received  4  weeks  of 
oral  dasatinib  (70  mg  BID)  followed  by  P/D  or  EPP.  If  either  a  radiographic  or  molecular  response  (de¬ 
phosphorylation  of  Src  Tyr419  in  tumor)  is  achieved,  an  additional  2  years  of  dasatinib  maintenance  after 
adjuvant  radiotherapy  and  systemic  chemotherapy  is  given.  The  primary  endpoint  of  this  trial  was  biomarker 
modulation  of  Src  Tyr419.  Secondary  endpoints  included  response,  survival,  safety/toxicity,  and  biomarker 
modulation  in  tumor/serum/platelets/pleural  effusion.  The  total  planned  sample  size  is  24  to  detect  a  50% 
reduction  in  positive  p-Src  Tyr419  expression  with  80%  power,  one-sided  10%  type  I  error  rate,  and  10% 
inevaluable  rate.  Results:  To  date,  ten  patients  have  registered  on  the  trial  (4/08  -  12/08);  six  have  successfully 
completed  the  ESS,  neoadjuvant  dasatinib,  and  P/D  (n=3)  or  EPP  (n=3).  Two  patients  are  still  receiving 
neoadjuvant  dasatinib;  and  2  patients  were  deemed  to  not  be  surgical  candidates  due  to  a  rapid  decline  in  PS 
and  one  was  found  to  have  bilateral  mesothelioma.  The  main  side  effects  to  dasatinib  were  grade  1-2:  anemia, 
nausea,  vomiting,  anorexia,  electrolyte  abnormalities,  fatigue,  and  anxiety.  Grade  3  toxicities  included 
hyperkalemia  (1),  infection  -  pneumonia  (1),  and  hypoxia  (1).  There  were  no  grade  4-5  toxicities.  Post-surgical 
grade  3  toxicity  included  anemia,  electrolyte  abnormalities,  arrhythmia,  HTN,  and  pleural  effusion;  one  grade  4 
episode  of  hyperglycemia  was  seen.  Conclusions:  This  study  demonstrates  that  biomarker-based  neoadjuvant 
MPM  trials  with  novel  agents  are  feasible.  Updated  clinical  and  translational  correlative  results  will  be  presented. 


Abstract  Disclosures 

Faculty  and  Discussant  Disclosures 

Annual  Meeting  Planning  Committee  Disclosures 

2009  Annual  Meeting  Proceedings  Part  I  Errata 

Abstracts  that  were  granted  an  exception  in  accordance  with  ASCO's  Conflict  of  Interest  Policy  and  are 


http://www.asco.org/portal/site/ASCOv2/template.RAW/menuitem.alc60e38cd6d5b9f01ae0094ef37a01d/7j...  6/26/2009 


Print 


Page  2  of  3 


designated  with  a  caret  symbol  (A)  here  and  in  the  print  version. 


►  Associated  Presentation(s): 

1.  Phase  I  trial  of  neoadjuvant  dasatinib  in  patients  with  resectable  malignant  pleural 
mesothelioma. 

Meeting:  2009  ASCO  Annual  Meeting 
Presenter:  Reza  Mehran,  MD 

Session:  Lung  Cancer  -  Local-Regional  and  Adjuvant  Therapy  (General  Poster 
Session) 

►  Other  Abstracts  in  this  Sub-Category: 

1 .  A  phase  III  trial  of  carboplatin,  paclitaxel,  and  thoracic  radiation  therapy  with  or 
without  thalidomide  in  patients  with  stage  III  non-small  cell  carcinoma  of  the  lung 
(NSCLC):  E3598. 

Meeting:  2009  ASCO  Annual  Meeting  Abstract  No:  7503  First  Author:  J.  H.  Schiller 
Category:  Lung  Cancer--Local-Regional  and  Adjuvant  Therapy  -  Local-Regional 
Therapy 

2.  Randomized,  phase  III  study  of  mitomycin/vindesine/cisplatin  (MVP)  versus  weekly 
irinotecan/carboplatin  (1C)  or  weekly  paclitaxel/carboplatin  (PC)  with  concurrent 
thoracic  radiotherapy  (TRT)  for  unresectable  stage  III  non-small  cell  lung  cancer 
(NSCLC):  WJTOG0105. 

Meeting:  2009  ASCO  Annual  Meeting  Abstract  No:  7504  First  Author:  M.  Satouchi 
Category:  Lung  Cancer--Local-Regional  and  Adjuvant  Therapy  -  Local-Regional 
Therapy 

3.  Phase  II  study  of  pemetrexed,  carboplatin,  and  thoracic  radiation  with  or  without 
cetuximab  in  patients  with  locally  advanced  unresectable  non-small  cell  lung  cancer: 
CALGB  30407. 

Meeting:  2009  ASCO  Annual  Meeting  Abstract  No:  7505  First  Author:  R.  Govindan 
Category:  Lung  Cancer-Local-Regional  and  Adjuvant  Therapy  -  Local-Regional 
Therapy 

More... 

►  Abstracts  by  R.  Mehran: 

1.  Integrating  microRNA  and  mRNA  expression  profiling  using  a  novel  algorithm 
identified  a  small  set  of  unique  genes  upregulated  in  malignant  pleural  mesothelioma 
(MPM). 

Meeting:  2009  ASCO  Annual  Meeting  Abstract  No:  e221 1 1  First  Author:  M. 
Suraokar 

Category:  Tumor  Biology  and  Human  Genetics  -  Molecular  Targets 

2.  Phase  I  trial  of  neoadjuvant  dasatinib  in  patients  with  resectable  malignant  pleural 
mesothelioma. 

Meeting:  2009  ASCO  Annual  Meeting  Abstract  No:  7580  First  Author:  R.  Mehran 


http://www.asco.org/portal/site/ASCOv2/template.RAW/menuitem.alc60e38cd6d5b9f01ae0094ef37a01d/7j...  6/26/2009 


Print 


Page  3  of  3 


Category:  Lung  Cancer-Local-Regional  and  Adjuvant  Therapy  -  Local-Regional 
Therapy 

3.  Correlation  of  endoscopic  tumor  length  with  lymph  node  involvement  and  poor  long¬ 
term  survival  in  esophageal  cancer  patients. 

Meeting:  2009  Gastrointestinal  Cancers  Symposium  Abstract  No:  14  First  Author:  P. 
Gaur 

Category:  Esophagus  and  Stomach  -  Prevention,  diagnosis,  and  screening 
More... 

►  Presentations  by  R.  Mehran: 

1.  Phase  I  trial  of  neoadjuvant  dasatinib  in  patients  with  resectable  malignant  pleural 
mesothelioma. 

Meeting:  2009  ASCO  Annual  Meeting 
Presenter:  Reza  Mehran,  MD 

Session:  Lung  Cancer  -  Local-Regional  and  Adjuvant  Therapy  (General  Poster 
Session) 

More... 

►  Educational  Book  Manuscripts  by  R.  Mehran : 

No  items  found. 


^Gopyriphs  2QD&  American  Socitly  of  Clinical  Oncology  All  rights  woridwwde. 


http://www.asco.org/portal/site/ASCOv2/template.RAW/menuitem.alc60e38cd6d5b9f01ae0094ef37a01d/7j...  6/26/2009 


Keapl  and  Nrf2  Expression  in  Non-Small  Cell  Lung  Carcinomas  Correlates  with 
Clinicopathological  Features 

Solis  LM,  Behrens  C,  Bekele  BN,  Suraokar  M,  Ozbum  N,  Moran  CA,  Minna  J,  Stewart 
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Most  non- small  cell  lung  carcinomas  (NSCLC)  demonstrate  resistance  to  chemotherapy. 
Nuclear  factor  erythroid-2  related  factor  2  (Nrf2)  is  a  transcription  factor  associated  with 
in  vitro  resistance  to  chemotherapy.  Kelch-like  ECH-associated  protein  1  (Keapl)  is  a 
cytoplasmic  repressor  of  Nrf2.  KEAP1  inactivation  is  a  relatively  frequent  genetic 
alteration  in  NSCLC,  and  leads  to  Nrf2  activation  (Singh  et  al,  PloS  Med  3:e240,  2006). 
We  investigated  the  immunohistochemical  (IHC)  expression  of  nuclear  Nrf2  and 
cytoplasmic  Keapl  proteins  in  304  surgically  resected  NSCLC  tissues  in  tissue 
microarrays  (adenocarcinomas,  n=190;  squamous  cell  carcinomas,  n=114).  We  correlated 
those  findings  with  patients’  clinicopathological  features,  and  in  adenocarcinomas  with 
EGFR  and  KRAS  mutations.  We  also  examined  the  expression  of  Nrf2  and  Keapl  using 
whole  tissue  sections  in  79  NSCLC  tumors  (36  chemo  naive  and  43  treated  with 
neoadjuvant  chemotherapy).  We  detected  Nrf2  expression  in  26%  (77/299)  of  NSCLCs 
being  significantly  higher  in  squamous  cell  carcinoma  (43/1 12,  38%)  compared  with 
adenocarcinoma  (34/188,  18%;  P=0.0001).  In  adenocarcinomas,  Nrf2  was  not  expressed 
in  EGER  mutant  (0/23)  compared  with  wild-type  tumors  (31/145,  21%;  P=  0.009). 

Keapl  expression  score  was  significantly  higher  in  squamous  cell  carcinoma  compared 
with  adenocarcinoma  (PO.OOOl).  In  patients  with  NSCLC  stage  I/II,  who  did  not  receive 
adjuvant  or  neoadjuvant  treatment,  Nrf2  overexpression  significantly  correlated  with  poor 
overall  survival  in  multivariate  analysis  (HR=2.468;  95%CI  1.468,  4.151;  P=0.0007).  In 
patients  with  squamous  cell  carcinoma  histology,  low  Keapl  expression  correlated  with 
poor  overall  survival  (HR=0.479;  95%CI  0.260,  0.882;  P=0.018).  We  found  that  Nrf2 
expression  in  tumor  tissue  sections  is  heterogeneous  and  ranges  from  5-80% 

(mean=27%)  of  tumor  cells.  NSCLC  resected  from  patients  treated  with  neoadjuvant 
chemotherapy  showed  Nrf2  expression  in  28%  (12/43)  of  NSCLC  tumors,  being  higher 
in  squamous  cell  carcinoma  (5/11,  45%).  REAP l  mutation  (exons  2-5)  was  detected  in 
1/20  tumors  examined.  Normal  bronchial  epithelia  adjacent  to  NSCLC  tumors  did  not 
show  Nrf2  expression,  suggesting  that  no  field  effect  phenomenon  on  Nrf2  expression  is 
present.  We  conclude  that:  1.  increased  expression  of  Nrf2  and  decreased  expression  of 
Keapl  are  relatively  frequent  abnormalities  in  NSCLC,  especially  in  squamous  cell 
carcinoma  histology;  and,  2.  altered  IHC  expression  of  these  markers  correlates  with 
NSCLC  patients’  outcome.  The  identification  of  the  subset  of  patients  with  abnormal 
expression  of  Nrf2  may  be  important  for  better  selection  of  treatment  in  NSCLC. 
(Supported  by  grants  US  DoD  W81XWH-07- 1-0306,  and  UT-Lung  SPORE 
P50CA70907). 
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Abstract 

Epiregulin  (EREG)  is  a  growth  factor  that  belongs  to  the  epidermal  growth  factor  (EGF) 
family.  Although  recent  studies  hav  e  reported  the  overexp  ression  of  E  REG  in  several 
types  of  cancers  including  pancreas,  colon  and  bladder  cancers,  its  biological  and 
clinicopathological  signifi  cance  in  lung  cancer  developm  ent  still  rem  ains  unknown 
leading  us  to  study  the  expression,  m  olecular  genetic  and  clinical  correlations  and 
functional  consequences  of  EREG  in  lung  cancers.  We  first  examined  the  expression  of 
EREG  mRNA  by  quantitative  R  T-PCR  and  corre  lated  this  with  clinical  pa  rameters, 
KRAS,  and  EGFR  mutation  status  in  63  lung  cancer  cell  lines  includ  ing  26  s  mall  cell 
lung  cancer  (SCLC)  cell  lines  and  37  non-sm  all  cell  lung  cancer  (NSCLC)  cell  lines, 
and  89  primary  NSCLC  tumor  specimens.  EREG  expression  was  significantly  higher  in 
NSCLC  compared  to  SCLC  lines  (P<0.001)  and  was  significantly  higher  in  NSCLC 
lines  with  KRAS  m  utations  th  an  NSCLC  lines  with  wild-type  KRAS  (P=0.018).  In 
primary  NSCLC  tumors,  EREG  expression  was  significantly  higher  in  KR  AS 
mutation-positive  turn  ors  (P=0.032  )  but  lower  in  EGFR  m  utation-positive  tumors 
(P=0.002).  EREG  was  abundantly  expressed  in  turn  ors  with  pleural  involvement 
(P=0.002),  lymphatic  permeation  (P=0.026)  or  vascular  invasion  (P=0.004)  compared  to 
those  without  such  characteris  tics.  We  performed  combined  microarray  expression 
irofiling  on  4  NSCLC  lines  with  or  without  short  hairpin  RIN  A  (shRNA)-mediated 
stable  KRAS  knockdown  and  immortalized  human  bronchial  epithelial  cells 
fHBECs)  with  and  without  oncogenic  KRAS,  and  found  that  EREG  was 
significantly  unregulated  by  oncogenic  KRAS.  Sm  all  interf  ering  RNAs 
(siRNAs)-mediated  knockdown  of  EREG  expression  inhibited  in  vitro  cell  growth  of 
KRAS  mutant/EREG  overexpressing  NSCLC  cells,  while  siRNAs  targeted  at  EREG  did 
not  affect  the  growth  of  EREG-nonexpressi  ng  NSCLC  cells.  These  results  indicate 
that  oncog  enic  activation  of  KRAS  pos  itively  regulates  EREG  expression,  which 
contributes  to  aggressive  phenotypes  of  NSCLC  turn  ors,  and  identifying  EREG  as  a 
therapeutic  target  for  KRAS  mutant  NSCLCs. 
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MPM  is  a  highly  aggressive  neoplasm  with  poor  prognosis  and  needs  discovery  of  newer  and  critical  therapeutic  targets. 
MicroRNA’s  (miRNA's)  play  an  important  role  in  many  different  types  of  cancer  but  there  is  lack  of  published  reports  detailing  their 
role  in  MPM.  We  decided  to  employ  a  global  profiling  strategy  using  miRNA  microarrays  to  search  for  miRNA’s  involved  in  the 
pathogenesis  of  MPM.  We  analyzed  miRNA  profiles,  using  Agilent  human  miRNA  microarray  vl  slides,  to  find  an  up  regulation  of 
44  versus  down  regulation  of  29  miRNA's  in  mesothelioma  MSTO-21 1H  cancer  cells  compared  to  HCT-4012  -  a  pleural 
telomerase-transformed  control  cell  line.  Whereas  profiling  of  16  MPM  tissues  (8  normal  versus  8  tumor)  revealed  down  regulation 
of  1 1  miRNA’s  in  tumor  tissue.  Along  with  addressing  the  discrepancy  between  cells  and  tissue  with  respect  to  miRNA  profiles  we 
needed  to  devise  a  method  to  screen  the  possible  candidates  in  order  to  focus  on  the  most  relevant  miRNA’s.  One  alternative  is  to 
choose  miRNA's  that  regulate  genes  known  to  be  involved  in  the  cause  or  progression  of  MPM.  However  search  of  miRNA  targets 
using  the  online  targetscan  4.2  program  (http://www.targetscan.org)  resulted  in  >1000  unique  genes.  This  is  expected  since 
miRNA's  are  thought  to  regulate  hundreds  of  genes  and  multiple  miRNA’s  could  regulate  a  common  message.  Therefore  we 
decided  to  explore  a  novel  screening  strategy,  which  integrates  miRNA  with  messenger  RNA  (cDNA)  expression  profiles  to  narrow 
down  our  list  of  miRNA’s.  We  obtained  cDNA  profiles  on  same  cell  lines  and  tissue  samples  using  Affymetrix  U133  plus  2.0  chips. 
Bioinformatic  analysis  using  the  MultiExperiment  Viewer  software  (www.tm4.org/mev.html),  involving  data  reduction  techniques 
(Correspondance  Analysis),  hierarchical  clustering  methods  and  Serial  Analysis  for  Microarray  (SAM),  proposed  up  regulation  of 
-300  genes  in  MPM  compared  to  normal  tissues.  Next  using  a  custom-designed  search  algorithm  we  computed  the  number  of 
miRNA’s  regulating  a  common  or  different  set  of  genes.  Of  the  -300  mRNA’s  up  regulated  in  MPM  only  32  are  recognized  by  the 
11  down  regulated  miRNA’s.  Moreover  most  of  the  miRNA’s  regulate  single  messages  while  -20  %  of  the  messages  are  regulated 
by  more  than  1  miRNA’s.  Interestingly  some  of  these  targets  include  Ets  variant  1  (ETV1)  and  Protein  kinase  C  -  epsilon  (PRKCE), 
which  has  not  been  evaluated  in  MPM  but  implicated  in  other  cancers.  Our  next  step  is  to  validate  our  profiling  studies  using  real¬ 
time  PCR  and  protein  analysis  methods.  Therefore  aside  from  selecting  highly  relevant  miRNA’s  our  innovative  approach  will  also 
enable  discovery  of  novel  genes  based  on  their  ability  to  be  bound  by  single  or  multiple  miRNA’s.  Supported  by  Grant:  DoD 
W81 XWH-07-1  -0306. 
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Abstract: 

Background:  Radiation  therapy  plays  an  important  role  in  achieving  local  control  of  many  solid  tumors  and 
prospective  clinical  trials  have  shown  enhanced  local  control  and  survival  when  radiation  is  combined  with 
chemotherapy  and/or  targeted  agents.  However,  there  are  currently  no  validated  biomarkers  that  predict  tumor 
response  to  radiation.  Therefore,  there  is  a  need  to  identify  predictive  biomarkers  to  refine  clinical  practice. 
Methods:  Reverse  phase  protein  arrays  (RPPA)  were  performed  to  measure  over  100  proteins  and  post- 
translational  modifications  in  the  NCI  60  cell  lines  under  serum-starved  and  serum-stimulated  conditions  to 
represent  the  chronic  hypoxia  and  reperfusion  taking  place  within  virtually  all  solid  tumors.  Protein  levels  were 
correlated  with  publicly  available  radiation  sensitivity  data  for  the  NCI  60.  T  tests  comparing  protein  expression 
and  activation  between  groups  of  radiosensitive  and  radioresistant  cells  as  defined  by  the  surviving  fraction  at  2 
Gy  (SF2)  and  the  dose  of  radiation  producing  37%  cell  survival  (DQ),  as  well  as  p53  mutational  status,  were  done 

to  identify  candidate  protein  biomarkers  and  signaling  pathways  involved  in  cellular  responses  to  radiation. 
Continuous  variable  correlation  analyses  comparing  radiation  sensitivity  to  protein  expression  confirmed 
statistically  significant  correlations.  Results:  The  basal  expression  of  more  than  10  proteins,  including  EGFR,  Src 
and  IGFR,  demonstrated  a  statistically  significant  correlation  with  radiation  resistance  (p-value  <  0.05).  Src  and 
Akt,  among  other  proteins,  also  had  statistically  significant  changes  in  protein  expression  between  serum-starved- 
then-stimulated  conditions  that  correlated  with  radiation  resistance.  These  identified  proteins  represent  canonical 
signaling  pathways  with  multiple  protein  effectors,  some  of  which  have  the  additional  potential  to  be 
therapeutically  targeted.  Conclusions:  This  work  identifies  candidate  proteins  and  signaling  pathways  associated 
with  the  modulation  of  radiation  sensitivity  that  may  serve  as  biomarkers  for  tumor  response  to  radiation  therapy 
as  well  as  targets  for  therapeutic  intervention.  (Supported  by  P50  CA70907,  W81XWH-07-1-0306  01) 
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Reverse  Phase  Protein  Arrays  Reveal  Biomarkers  of  Radiation  Resistance  in  Head  and 
Neck  and  Lung  Cancer  Cell  Lines 

John  S  Yordy,  Byers  LA,  Davies  M,  Molkentine  D,  Raju  U,  Mills  G,  Minna  J,  Coombes  K, 
Ang  KK,  Heymach  JV 

Background:  Radiation  therapy  is  used  to  improve  local  control  for  many  solid  tumors. 
Prospective  clinical  trials  have  shown  enhanced  local  control  and  survival  when 
chemotherapy  and/or  targeted  agents  is  combined  with  radiation.  However,  there  are 
currently  no  validated  biomarkers  predicting  tumor  response  to  radiation  or  combined 
therapy  and  these  are  essential  to  improve  clinical  practice. 

Methods:  Reverse  phase  protein  arrays  were  performed  to  measure  120  proteins  and 
post-translational  modifications  in  9  head  and  neck  (HN)  and  20  lung  cancer  cell  lines. 
Subsets  of  radiation-sensitive  and  radiation-resistant  cell  lines  were  identified  based  on 
the  surviving  fraction  at  2  Gy.  Protein  levels  were  correlated  with  radiation  sensitivity 
data  using  t-tests  to  identify  proteins  or  phosphoproteins  differentially  expressed 
between  groups  of  radiosensitive  and  radioresistant  cells  to  identify  candidate  protein 
biomarkers  and  signaling  pathways  involved  in  cellular  responses  to  radiation. 
Continuous  variable  correlation  analyses  comparing  radiation  sensitivity  to  protein 
expression  confirmed  statistically  significant  correlations. 

Results:  The  expression  of  more  than  10  proteins  was  significantly  correlated  with 
radiation  resistance  in  the  HN  and  lung  cancer  cell  lines,  including  Src,  EGFR,  IGFR  and 
receptor  tyrosine  kinase  downstream  signaling  effectors  such  as  phosphorylated  PI3K, 
STAT  family  members  and  MEK1  (p-value  <  0.05).  These  identified  proteins  are  part  of 
known  signaling  pathways  consisting  of  multiple  protein  effectors.  Some  of  these 
identified  proteins,  as  well  other  proteins  within  the  same  signaling  cascade,  have  the 
additional  potential  to  be  therapeutically  targeted. 

Conclusion:  These  findings  suggest  receptor  tyrosine  kinases  and  their  dependent 
downstream  signaling  pathways  are  associated  with  radiotherapy  resistance.  This  work 
identifies  candidate  proteins  and  signaling  pathways  associated  with  the  modulation  of 
radiation  sensitivity  that  may  serve  as  biomarkers  for  tumor  response  to  radiation 
therapy  as  well  as  targets  for  therapeutic  intervention  in  HN  and  lung  cancers. 
(Supported  by  P50  CA70907,  P50  CA97007,  W81XWH-07-1-0306  01) 


Profiling  in  pharmacologically  re-expressed  microRNAs  by  5-azacytidine  and 
SAHA  identified  a  metastasis  associated  miR-148b  in  malignant  pleural 

mesothelioma  cell  lines 

Corvalan  A,  Suraokar  M,  Gazdar  A,  Moran  C,  Raso  G,  Mehran  R,  Tsao  A, 

Wistuba  I. 

Background.  MicroRNAs  (  miRNAs)  have  e  merged  as  key  players  in  hum  an 
carcinogenesis.  Recently  it  has  been  shown  th  at  som  e  m  iRNAs  can  be  epigenetic  up- 
regulated  by  aberrant  hyperm  ethylation  in  hum  an  cancer.  Malignant  pleural 
mesothelioma  (MPM)  is  a  highly  m  alignant  neoplasm  with  different  histological 
subtypes.  To  explore  the  role  of  epigenet  ic  mediated  up-regulation  of  miRNAs  in  M  PM 
we  performed  pharmacological  unmasking  of  miRNA  expression  in  cell  lines. 

Methods.  Five  mesothelioma  cell  lines ,  including  one  norm  al  mesothelial  (Met5A)  and 
five  MPMs  (epitheliod  H2452,  biphasic  H2 1 1  and  unclassified  H28  and  H2052)  were 
treated  in  v  itro  with  the  dem  ethylating  agent  5  -aza-cytidine  (5-Aza;l  u  M)  and  SAHA 
(2.5  uM)  for  96  hrs.  After  RNA  extraction  (Trizol),  miRNA  profiling  was  performed  by 
Agilent  human  microRNA  kit  v2. 

Results.  Total  m  iRNA  up-regulated  (two-fold)  after  the  treatm  ent  were  299  (51%)  in 
normal  mesothelial  Met5A  cell  line,  and  lower  in  the  m  alignant  cell  lines:  171  (29%)  in 
H2452,  79  (13.5%)  in  H21 1,  55  (9  .4%)  in  H28,  and  56  (9.6  %)  in  H2052.  W  e  detected 
that  167  (55.9%)  m  iRNAs  were  exclusivel  y  up-regulated  in  Me  t5A,  56  (32.7%)  in 
H2452,  21  (26.6%)  in  H211,  16  (29.1%),  in  H28,  and  18  (32.1%)  in  H2052.  Am  ong  all 
unique  miRNA,  only  17  (let- 7b,  let-7c  ,  let-7f-2,  m  iR-302c,  miR-328,  miR-510,  miR- 
125b-l,  miR-16-1,  miR-223,  miR-302b,  miR-383,  miR-551b,  miR-922,  miR-148a,  miR- 
1 8b,  m  iR-302d,  m  iR-326)  have  been  previous  ly  associated  to  hum  an  carcinogenesis. 
Interestingly,  one  of  these  m  iRNA  (m  iR- 148a)  has  been  associ  atedwithm  icroRNA 
tumor  metastasis  signature. 

Discussion.  The  number  of  total  and  unique  m  iRNA  upregulated  after  5-Aza  and  SAHA 
was  lower  in  MPM  cell  lines  com  pared  with  n  ormal  Met5A  cell  line.  Up-regulation  of 
unique  m  iRNAs  was  found  associated  with  cell  lines  obtained  from  som  e  specific 
subtypes  of  MPM.  The  identification  of  m  etastasis  associated  m  iR-148a  suggests  a 
potential  biom  arker  f  or  m  etastasis  in  th  is  highly  m  alignant  neoplasm .  Further  studies, 
including  the  analysis  of  tissue  specimens  are  needed  to  validate  these  results. 

Grant  support:  PROSPECT  DoD  W81XWH-07-1-0306. 


Secreted  Cytokine  and  Angiogenic  Factor  (CAF)  profiles  associated  with 
age  and  sex  in  NSCLC. 

Matthew  H.  Herynk1,  Emer  Hanrahan1,  Heather  Yan  Lin2,  Tina  Cascone1, 

Shaoyu  Yan3,  Lauren  Byers4,  John  Yordy5,  J.  Jack  Lee2,  Hai  T.  Tran1,  and  John 
V  Heymach1. 

Departments  of  Thoracic/Head  and  Neck  Medical  Oncology1,  Biostatistics  and 
Mathematics2,  Pharmacy  Pharmacology  Research3,  Cancer  Medicine4,  and 
Radiation  Oncolgy5. 

Background:  Subgroup  analyses  from  recent  clinical  trials  in  non-small  cell  lung 
cancer  (NSCLC)  suggest  therapeutic  efficacy  in  a  sex-specific  manner  from 
drugs  such  as  bevacizumab  and  vandetanib.  These  differences  suggest  that 
factors  inherent  in  the  basic  male/female  biology  may  impact  growth  and  survival 
mechanisms  in  NSCLC  tumors.  We  sought  to  identify  if  there  are  sex-specific 
differences  in  secreted  cytokine  and  angiogenic  factors  (CAFs)  in  NSCLC  cell 
lines  and  patient  samples. 

Methods:  Thirty-five  CAFs  were  measured  by  multiplex  bead  suspension  arrays 
(MBSA)  and  ELISAs  from  pre-treatment  plasma  (N=123)  and  serum  (N=151) 
collected  from  patients  with  stage  IIIB/IV  NSCLC  participating  in  a  randomized 
phase  2  trials  of  vandetanib  alone  or  in  combination  with  chemotherapy.  MBSA 
were  used  to  measure  the  levels  of  48  secreted  CAFs  in  conditioned  media  from 
36  NSCLC  cell  lines  (female  N=17,  male  N=19).  Subconfluent  cells  were  serum- 
starved  overnight  and  the  media  was  changed,  24  hours  later,  conditioned  media 
was  collected  and  the  cells  were  lysed.  Measured  CAF  levels  were  normalized 
to  total  protein  from  whole  cell  lysates. 

Results:  Univariate  analysis  of  serum  and  plasma  samples  revealed  statistically 
significant  differences  in  the  concentrations  of  18  CAFs  between  male  and 
female  patients  with  most  being  higher  in  females  including;  plasma  IL-15  (mean 
1193  vs.  291  pg/ml;  P  =0.0009),  slL-2R  (mean  1413  vs.  577  pg/ml;  P  =0.004), 
MIG  (CXCL-9)  (mean  184  vs.  67  pg/ml;  P  =0.0007),  and  macrophage 
inflammatory  protein-1  (MIP-1  alpha,  CCL3)  (mean  319  vs.  108  pg/ml;  P 
=0.0067).  Conditioned  media  from  36  NSCLC  cell  lines  was  analyzed  for  levels 
of  secreted  CAFs.  Nine  CAFs  determined  to  be  statistically  significant  in  patient 
samples  were  also  present  in  the  cell  line  analyses  and  two  factors,  MIP-lalpha 
and  intracellular  adhesion  molecule-1  (ICAM-1)  also  demonstrated  increased 
levels  in  female  versus  male  cell  lines,  but  these  differences  did  not  reach 
statistical  significance.  While  18  CAFs  were  statistically  significant  in  patient 
samples,  no  individual  factors  were  statistically  significant  in  conditioned  media 
from  cell  lines.  Subgroup  analysis  of  female  cell  lines  revealed  an  age 
association  with  26  secreted  CAFs  in  NSCLC  cell  lines.  The  majority  were 
upregulated  in  cell  lines  originally  derived  from  patients  >50  y/o  (N=10)  vs  <50 
(N=5)  including  IL-15  (2.05  vs.1.27  pg/ml,  P=0.011),  MIG  (0.12  vs.  0.095  pg/ml, 


P=0.033),  EGF  (14.21  vs.  11.25  pg/ml,  P=0.034),  and  ICAM-1  (11.08  vs.  7.66 
pg/ml  P=0.057). 


Conclusions:  Significant  CAF  differences  were  observed  when  male  and  female 
patient  samples  and  conditioned  media  from  cell  lines  were  analyzed,  thus 
suggesting  an  important  role  for  age  and  sex  in  the  secreted  CAF  profiles  of 
NSCLC.  Because  EGFR  inhibitors  have  shown  preferential  efficacy  for  females, 
and  hormone  signaling  varies  between  male  vs.  female  populations  as  well  as 
between  younger  vs.  older  women,  the  contributions  of  EGFR  and  hormone 
signaling  on  the  sex-different  secreted  factors  is  being  further  investigated. 

These  secreted  factors  are  involved  in  a  number  of  signaling  networks  and  thus 
may  contribute  to  a  broad  range  of  effects  on  tumor  growth,  metastases,  and 
therapeutic  efficacy  of  angiogenesis  inhibitors  and  other  targeted  agents. 


Immunohistochemical  Expression  of  Membrane  Transporters  Correlates  with  Histology 
of  Non-Small  Cell  Lung  Carcinoma.  Maria  Nunez,  Carmen  Behrens,  Heather  Lin, 
Ludmila  Prudkin,  Milind  Suraokar,  Denise  M.  Woods,  Luc  Girard,  John  Minna,  Jack  Lee, 
Wayne  Hoftetter,  Wilbur  Franklin,  Cesar  A.  Moran,  Wilbur  Franklin,  Waun  Ki  Hong, 
David  Stewart,  Ignacvio  I.  Wistuba. 

Membrane  transporters  Folate  receptor  alpha  (FOLR1),  Reduced  folate  carrier  1  (RFC1), 
Copper  transporter  receptor  1(CTR1),  Glucose  4  (GLUT4)  and  RHOA  regulate  uptake  of 
molecules  and  drugs  inside  the  cell.  FOLR1  and  RFC1  are  over  expressed  in  epithelial 
tumors  and  are  potential  therapeutic  targets  and  tumor  biomarkers;  however  there  is 
limited  information  on  the  expression  of  these  receptors  in  non-small  cell  lung  carcinoma 
(NSCLC). 

Immunohistochemical  (IHC)  protein  expression  of  FOLR1,  RFC1,  CTR1,  GLUT4  and 
RHOA  was  examined  in  320  surgically  resected  NSCLCs  placed  in  tissue  microarrays, 
including  202  adenocarcinomas  and  110  squamous  carcinomas,  and  correlated  with 
patients’  clinico-pathological  characteristics.  A  semiquantitative  IHC  score  was  obtained 
assessing  intensity  of  immunostaining  and  percentage  of  positive  tumor  cells. 

The  pattern  of  IHC  expression  varied  in  malignant  cells,  with  FOLR1,  RFC1  and 
GLUT4  expressed  in  the  membrane  and  cytoplasm,  CTR1  expressed  in  the  cytoplasm 
and  nucleus,  and  RHOA  expressed  only  in  the  cytoplasm.  In  all  cases  expression  in  tumor 
cells  was  higher  than  in  non-malignant  lung  epithelial  cells.  Tumor  stroma  IHC 
expression  was  frequently  detected,  especially  in  endothelial  cells,  lymphocytes, 
macrophages  and  fibroblasts.  Adenocarcinomas  showed  significantly  higher  expression 
compared  with  squamous  cell  carcinoma  for  most  markers,  including  membrane 
(PO.OOl)  and  cytoplasmic  (PO.OOl)  FOLR1,  cytoplasmic  (PO.OOl)  and  nuclear 
(PO. 004)  CTR1,  and  cytoplasmic  RHOA  (PO.OOl).  Female  NSCLC  patients  had 
significantly  higher  expression  of  membrane  and  cytoplasmic  FOLR1  (PO.Ol)  compared 
with  male  patients.  Ever  smoker  patients  demonstrated  significantly  lower  expression  of 
membrane  (P<0.001)  and  cytoplasmic  FOLR1  (P0.002),  and  higher  expression  of 
membrane  (P=0.04)  and  cytoplasmic  (P=0.03)  GLUT4,  and  membrane  RFC1  (P=0.01), 
compared  with  never  smokers.  In  adenocarcinomas,  the  presence  of  EGFR  mutations 
correlated  with  higher  expression  of  membrane  FOLR1  (P0.002),  and  KRAS  mutation 
with  higher  expression  of  membrane  GLUT4  (P0.004)  and  lower  expression  of  nuclear 
CTR1  (P=0.02).  Finally,  squamous  carcinomas  showed  higher  positive  endothelial  cell 
expression  of  FOLR1  (P=0. 00001)  than  adenocarcinomas. 

We  conclude:  1.  membrane  transporters  proteins  are  over  expressed  in  NSCLC  compared 
to  normal  lung  epithelium;  2.  significant  differences  were  found  between 
adenocarcinomas  and  squamous  lung  cancer  in  both  tumor  cells  and  the  tumor 
microenvironment;  3.  differences  were  found  in  tumors  of  males  and  females,  between 
tumors  from  never  and  ever  smokers,  and  tumors  with  EGFR  or  KRAS  mutations.  The 
different  patterns  of  transporter  expression  may  explain  the  superior  response  of  NSCLC 
patients  with  adenocarcinoma  histology  to  pemetrexed.  Supported  by  grants  US  DoD 
W81XWH-07- 1-0306,  and  UT-Lung  SPORE  P50CA70907 


Enriched  Tumor  Expression  of  Folate  Transporters  Correlates  With  Adenocarcinoma  Histology  Type, 
Female  Gender  and  Presence  of  EGFR  Mutation  in  Non-Small  Cell  Lung  Carcinoma 

Author  Block:  Maria  Ines  Nunez,  Carmen  Behrens,  Denise  M.  Woods,  Heather  Lin,  Milind  Suraokar,  Luc 
Girard,  John  Minna,  Jack  Lee,  W  Hofstetter,  Wilbur  Franklin,  Cesar  A.  Moran,  Waun  K.  Hong,  David  J. 
Stewart,  Ignacio  I.  Wistuba. 

Pathology  Department  -  UT  M.D.  Anderson  Cancer  Center,  Houston,  TX,  Biostatistics  Department  -  UT 
M.D.  Anderson  Cancer  Center,  Houston,  TX,  Hamon  Center  -  UT  Southwestern  Medical  Center,  Dallas, 
TX,  Thoracic  Surgery  Department  -  UT  M.D.  Anderson  Cancer  Center,  Houston,  TX,  University  of 
Colorado  Cancer  center,  Denver,  CO,  Thoracic/Head  and  Neck  Medical  Oncology  Department  -  UT  M.D. 
Anderson  Cancer  Center,  Houston,  TX 

Background:  Membrane  bound  folate  receptor  alpha  (FOLR1)  and  transmembrane  Reduced  folate 
carrier  1  (RFC1)  regulate  uptake  of  folate  as  well  as  folate  linked  conjugates  inside  the  cell.  FOLR1  and 
RFC1  are  over  expressed  in  epithelial  primary  and  metastatic  tumors  and  are  promising  therapeutic 
targets  and  tumor  biomarkers.  Due  to  limited  information  on  the  expression  of  these  receptors  in  non¬ 
small  cell  lung  carcinoma  (NSCLC)  we  studied  the  protein  immunohistochemical  (IHC)  expression  of 
these  receptors  in  a  large  set  of  tumors  and  correlate  our  findings  with  patients'  clinicopathologic 
features. 

Methods:  IHC  protein  expression  of  FOLR1,  RFC1,  was  examined  in  320  surgically  resected  NSCLCs 
placed  in  tissue  microarrays,  including  202  adenocarcinomas  and  110  squamous  carcinomas,  and 
correlated  with  patients'  clinico-pathological  characteristics.  A  semiquantitative  IHC  score  was  obtained 
assessing  intensity  of  immunostaining  and  percentage  of  positive  tumor  cells. 

Results :  The  pattern  of  IHC  expression  varied  in  malignant  cells,  with  FOLR1  and  RFC1  expressed  in  the 
membrane  and  cytoplasm.  In  all  cases  expression  in  tumor  cells  was  higher  than  in  non-malignant  lung 
epithelial  cells.  Tumor  stroma  IHC  expression  was  frequently  detected,  especially  in  endothelial  cells, 
lymphocytes,  macrophages  and  fibroblasts.  Adenocarcinomas  showed  significantly  higher  expression 
compared  with  squamous  cell  carcinoma  for  membrane  (P<0.001)  and  cytoplasmic  (P<0.001)  FOLR1.  . 
Interestingly,  these  protein  expression  findings  are  supported  by  4  published  gene  expression  datasets, 
collectively  profiling  about  400  tumor  samples,  which  show  that  FOLR1  mRNA  is  expressed  at  higher 
levels  in  adenocarcinomas  compared  to  squamous  cell  carcinomas.  Female  NSCLC  patients  had 
significantly  higher  expression  of  membrane  and  cytoplasmic  FOLR1  (P=0.01)  compared  with  male 
patients.  Ever  smoker  patients  demonstrated  significantly  lower  expression  of  membrane  (P<0.001)  and 
cytoplasmic  FOLR1  (P< 0.002),  and  higher  expression  of  membrane  RFC1  (P=0.01),  compared  with  never 
smokers.  In  adenocarcinomas,  the  presence  of  EGFR  mutations  correlated  with  higher  expression  of 
membrane  FOLR1  (P< 0.002).  Finally,  squamous  carcinomas  showed  higher  positive  endothelial  cell 
expression  of  FOLR1  (P=0. 00001)  than  adenocarcinomas 

Conclusion :  1.  FOLR1  and  RFC1  membrane  transporters  proteins  are  over  expressed  in  NSCLC  compared 
to  normal  lung  epithelium;  2.  significant  differences  were  found  between  adenocarcinomas  and 
squamous  lung  cancer  in  both  tumor  cells  and  the  tumor  microenvironment;  3.  differences  were  found 
in  tumors  of  males  and  females,  between  tumors  from  never  and  ever  smokers,  and  tumors  with  EGFR 


mutations.  The  different  patterns  of  transporter  expression  may  explain  the  superior  response  of  NSCLC 
patients  with  adenocarcinoma  histology  to  pemetrexed. 

Supported  by  grants  US  DoD  W81XWH-07-1-0306,  and  UT-Lung  SPORE  P50CA70907. 


Importance  of  Histopathology  Quality  Control  of  Non-Small  Cell  Lung  Cancer  Tissue 
Specimens  for  DNA/RNA  Extraction  and  Profiling  Analysis.  G.  Raso,  A.  Corvalan,  C. 
Behrens,  A.  Basey,  G.  Mendoza,  J.  Roth,  C.  Moran,  I.  Wistuba. 

Introduction:  High  throughput  molecular  profiling  technologies  require  good  quality 
tumor  tissue  samples  and  nucleic  acids  products.  To  achieve  these  high  standards  in  our 
tissue  bank  we  have  in  place  a  series  of  quality  control  activities,  including  detailed 
pathology  analysis  in  frozen  tissue  specimens. 

Methods:  From  more  than  1,500  primary  NSCLC  tumor  frozen  samples  collected  from 
1997  to  2007  we  selected  a  subset  of  492  cases  stored  in  liquid  nitrogen  vapor  phase. 
DNA  and  RNA  were  extracted  and  quantitated  using  a  bioanalyzer  system  (Agilent). 
Before  DNA/RNA  extraction,  we  performed  a  detailed  histopathology  quality  control  of 
the  frozen  tissues  to  assess  percentage  of  tumor  tissue,  tumor  cells,  normal  tissue, 
necrosis,  fibrosis  and  inflammation. 

Results:  Tumor  >70%  was  present  in  82%  of  the  NSCLC  specimens.  Tumor  cell  content 
>50%  was  present  in  64%  (n=284  cases)  of  NSCLCs,  being  68%  in  adenocarcinomas 
(n=211  cases)  and  54%  (n=73  cases)  in  squamous  cell  carcinomas.  Thirty-eight  percent 
of  adenocarcinomas  and  48%  of  squamous  cell  carcinomas  showed  100%  tumor  tissue 
content.  Ten  to  30%  of  normal  parenchyma  was  present  in  43%  of  both  histologies.  From 
311  tumor  samples  in  which  RNA  integrity  number  (RIN)  was  obtained,  RIN>8  was 
found  in  26%  and  RIN>5  in  51%. 

Conclusions:  The  required  minimum  standard  for  tumor  content  (>70%)  was  achieved  in 
most  of  our  NSCLC  cases.  However,  tumor  cell  content  was  lower,  especially  in 
squamous  cell  carcinoma  histology.  The  difference  found  in  tumor  cell  content  between 
squamous  cell  carcinoma  and  adenocarcinoma  reflects  the  morphological  heterogeneity 
of  NSCLC.  This  wok  has  been  supported  by  the  US  Department  of  Defense  PROSPECT 
and  UT-Lung  SPORE  grants. 


Keapl  and  Nrf2  Expression  in  Non-Small  Cell  Lung  Carcinomas  Correlates  with 
Clinicopathological  Features 

Solis  LM,  Behrens  C,  Bekele  BN,  Suraokar  M,  Ozbum  N,  Moran  CA,  Minna  J,  Stewart 
D,  Swisher  S,  Corvalan  AH,  Wistuba  I. 

UT-M.D.  Anderson  Cancer  Center,  Houston  TX,  Hamon  Center  for  Therapeutic 
Oncology  Research-Simmons  Cancer  Center,  UT  Southwestern  Medical  Center,  Dallas 

Most  non- small  cell  lung  carcinomas  (NSCLC)  demonstrate  resistance  to  chemotherapy. 
Nuclear  factor  erythroid-2  related  factor  2  (Nrf2)  is  a  transcription  factor  associated  with 
in  vitro  resistance  to  chemotherapy.  Kelch-like  ECH-associated  protein  1  (Keapl)  is  a 
cytoplasmic  repressor  of  Nrf2.  KEAP1  inactivation  is  a  relatively  frequent  genetic 
alteration  in  NSCLC,  and  leads  to  Nrf2  activation  (Singh  et  al,  PloS  Med  3:e240,  2006). 
We  investigated  the  immunohistochemical  (IHC)  expression  of  nuclear  Nrf2  and 
cytoplasmic  Keapl  proteins  in  304  surgically  resected  NSCLC  tissues  in  tissue 
microarrays  (adenocarcinomas,  n=190;  squamous  cell  carcinomas,  n=114).  We  correlated 
those  findings  with  patients’  clinicopathological  features,  and  in  adenocarcinomas  with 
EGFR  and  KRAS  mutations.  We  also  examined  the  expression  of  Nrf2  and  Keapl  using 
whole  tissue  sections  in  79  NSCLC  tumors  (36  chemo  naive  and  43  treated  with 
neoadjuvant  chemotherapy).  We  detected  Nrf2  expression  in  26%  (77/299)  of  NSCLCs 
being  significantly  higher  in  squamous  cell  carcinoma  (43/1 12,  38%)  compared  with 
adenocarcinoma  (34/188,  18%;  P=0.0001).  In  adenocarcinomas,  Nrf2  was  not  expressed 
in  EGER  mutant  (0/23)  compared  with  wild-type  tumors  (31/145,  21%;  P=  0.009). 

Keapl  expression  score  was  significantly  higher  in  squamous  cell  carcinoma  compared 
with  adenocarcinoma  (PO.OOOl).  In  patients  with  NSCLC  stage  I/II,  who  did  not  receive 
adjuvant  or  neoadjuvant  treatment,  Nrf2  overexpression  significantly  correlated  with  poor 
overall  survival  in  multivariate  analysis  (HR=2.468;  95%CI  1.468,  4.151;  P=0.0007).  In 
patients  with  squamous  cell  carcinoma  histology,  low  Keapl  expression  correlated  with 
poor  overall  survival  (HR=0.479;  95%CI  0.260,  0.882;  P=0.018).  We  found  that  Nrf2 
expression  in  tumor  tissue  sections  is  heterogeneous  and  ranges  from  5-80% 

(mean=27%)  of  tumor  cells.  NSCLC  resected  from  patients  treated  with  neoadjuvant 
chemotherapy  showed  Nrf2  expression  in  28%  (12/43)  of  NSCLC  tumors,  being  higher 
in  squamous  cell  carcinoma  (5/11,  45%).  REAP l  mutation  (exons  2-5)  was  detected  in 
1/20  tumors  examined.  Normal  bronchial  epithelia  adjacent  to  NSCLC  tumors  did  not 
show  Nrf2  expression,  suggesting  that  no  field  effect  phenomenon  on  Nrf2  expression  is 
present.  We  conclude  that:  1.  increased  expression  of  Nrf2  and  decreased  expression  of 
Keapl  are  relatively  frequent  abnormalities  in  NSCLC,  especially  in  squamous  cell 
carcinoma  histology;  and,  2.  altered  IHC  expression  of  these  markers  correlates  with 
NSCLC  patients’  outcome.  The  identification  of  the  subset  of  patients  with  abnormal 
expression  of  Nrf2  may  be  important  for  better  selection  of  treatment  in  NSCLC. 
(Supported  by  grants  US  DoD  W81XWH-07- 1-0306,  and  UT-Lung  SPORE 
P50CA70907). 
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Abstract: 

Background:  The  optimal  multi-modality  treatment  for  resectable  malignant  pleural  mesothelioma  (MPM) 
remains  unknown.  We  designed  a  biomarker-based  neoadjuvant  trial  from  our  preclinical  studies  showing  that 
dasatinib,  a  multi-targeted  Src  kinase  inhibitor,  has  activity  against  MPM  and  target  specificity  to  Src  Tyr419. 
Methods:  Untreated  MPM  patients  underwent  extended  surgical  staging  (ESS)  with  multiple  biopsies  to  account 
for  tumor  heterogeneity,  lymph  node  status  and  to  rule  out  sarcomatoid  features.  If  deemed  a  surgical  candidate 
for  either  pleurectomy/decortication  (P/D)  or  extrapleural  pneumonectomy  (EPP),  patients  received  4  weeks  of 
oral  dasatinib  (70  mg  BID)  followed  by  P/D  or  EPP.  If  either  a  radiographic  or  molecular  response  (de¬ 
phosphorylation  of  Src  Tyr419  in  tumor)  is  achieved,  an  additional  2  years  of  dasatinib  maintenance  after 
adjuvant  radiotherapy  and  systemic  chemotherapy  is  given.  The  primary  endpoint  of  this  trial  was  biomarker 
modulation  of  Src  Tyr419.  Secondary  endpoints  included  response,  survival,  safety/toxicity,  and  biomarker 
modulation  in  tumor/serum/platelets/pleural  effusion.  The  total  planned  sample  size  is  24  to  detect  a  50% 
reduction  in  positive  p-Src  Tyr419  expression  with  80%  power,  one-sided  10%  type  I  error  rate,  and  10% 
inevaluable  rate.  Results:  To  date,  ten  patients  have  registered  on  the  trial  (4/08  -  12/08);  six  have  successfully 
completed  the  ESS,  neoadjuvant  dasatinib,  and  P/D  (n=3)  or  EPP  (n=3).  Two  patients  are  still  receiving 
neoadjuvant  dasatinib;  and  2  patients  were  deemed  to  not  be  surgical  candidates  due  to  a  rapid  decline  in  PS 
and  one  was  found  to  have  bilateral  mesothelioma.  The  main  side  effects  to  dasatinib  were  grade  1-2:  anemia, 
nausea,  vomiting,  anorexia,  electrolyte  abnormalities,  fatigue,  and  anxiety.  Grade  3  toxicities  included 
hyperkalemia  (1),  infection  -  pneumonia  (1),  and  hypoxia  (1).  There  were  no  grade  4-5  toxicities.  Post-surgical 
grade  3  toxicity  included  anemia,  electrolyte  abnormalities,  arrhythmia,  HTN,  and  pleural  effusion;  one  grade  4 
episode  of  hyperglycemia  was  seen.  Conclusions:  This  study  demonstrates  that  biomarker-based  neoadjuvant 
MPM  trials  with  novel  agents  are  feasible.  Updated  clinical  and  translational  correlative  results  will  be  presented. 
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